Why solving linear regression needs Gradient Descent but not putting slope = 0

jasonxlui · Aug 14, 2022

In linear regression, we need to find the parameters that minimize the sum of squared residuals.
When we reach the minimum point, the slope =0, then we can have the equation and then solve the required parameter.
Then why do we need to use the gradient descent which needs a lot of steps to find the solution and is not the exact minimum point.
Thank you!

BigBeachBanana · Aug 14, 2022

jasonxlui said:
In linear regression, we need to find the parameters that minimize the sum of squared residuals.
When we reach the minimum point, the slope =0, then we can have the equation and then solve the required parameter.
Then why do we need to use the gradient descent which needs a lot of steps to find the solution and is not the exact minimum point.
Thank you!

You certainly can set the partial derivatives equal to 0.

\(\displaystyle S=\sum_i^n(Y_i-\hat{Y}_i)^2=\sum_i^n(Y_i-ax_i-b)^2\\\)

Take the partial derivatives with respect to a and b and set them equal to 0.
[math]\frac{\partial S}{\partial a}\left[\sum_i^n(Y_i-ax_i-b)^2\right]=0[/math][math]\frac{\partial S}{\partial b}\left[\sum_i^n(Y_i-ax_i-b)^2\right]=0[/math]
Now you have a system of equations, solve for [imath]a[/imath] and [imath]b[/imath]. It involves a bit of calculus but more on algebra.

blamocur · Aug 14, 2022

jasonxlui said:
In linear regression, we need to find the parameters that minimize the sum of squared residuals.
When we reach the minimum point, the slope =0, then we can have the equation and then solve the required parameter.
Then why do we need to use the gradient descent which needs a lot of steps to find the solution and is not the exact minimum point.
Thank you!

To summarize @BigBeachBanana's reply: we don't.

jasonxlui · Aug 14, 2022

@blamocur @BigBeachBanana
Thank you all!

So, can I say, both putting partial derivative = 0 and gradient descent can solve the problem.

Perhaps partial derivative=0 is more time-time consuming method, it can generate a formula for me to use it forever.
Then if I need to find the minimum point next time, I just need to put the training data to the formula.
This method is more convenient only when I need to solve the problem more than one time?

BigBeachBanana · Aug 14, 2022

jasonxlui said:
Perhaps partial derivative=0 is more time-time consuming method, it can generate a formula for me to use it forever.
Then if I need to find the minimum point next time, I just need to put the training data to the formula.
This method is more convenient only when I need to solve the problem more than one time?

Yes, that's correct. Someone who thought of that idea first already solved a and b, and the equations have been in practice for a very long time. However, you're welcome to derive them yourself.

Why solving linear regression needs Gradient Descent but not putting slope = 0

jasonxlui

New member

BigBeachBanana

Senior Member

blamocur

Elite Member

jasonxlui

New member

BigBeachBanana

Senior Member