Why solving linear regression needs Gradient Descent but not putting slope = 0

jasonxlui

New member
Joined
Aug 14, 2022
Messages
4
In linear regression, we need to find the parameters that minimize the sum of squared residuals.
When we reach the minimum point, the slope =0, then we can have the equation and then solve the required parameter.
Then why do we need to use the gradient descent which needs a lot of steps to find the solution and is not the exact minimum point.
Thank you!
 
In linear regression, we need to find the parameters that minimize the sum of squared residuals.
When we reach the minimum point, the slope =0, then we can have the equation and then solve the required parameter.
Then why do we need to use the gradient descent which needs a lot of steps to find the solution and is not the exact minimum point.
Thank you!
You certainly can set the partial derivatives equal to 0.

\(\displaystyle S=\sum_i^n(Y_i-\hat{Y}_i)^2=\sum_i^n(Y_i-ax_i-b)^2\\\)

Take the partial derivatives with respect to a and b and set them equal to 0.
[math]\frac{\partial S}{\partial a}\left[\sum_i^n(Y_i-ax_i-b)^2\right]=0[/math][math]\frac{\partial S}{\partial b}\left[\sum_i^n(Y_i-ax_i-b)^2\right]=0[/math]
Now you have a system of equations, solve for [imath]a[/imath] and [imath]b[/imath]. It involves a bit of calculus but more on algebra.
 
In linear regression, we need to find the parameters that minimize the sum of squared residuals.
When we reach the minimum point, the slope =0, then we can have the equation and then solve the required parameter.
Then why do we need to use the gradient descent which needs a lot of steps to find the solution and is not the exact minimum point.
Thank you!
To summarize @BigBeachBanana's reply: we don't.
 
@blamocur @BigBeachBanana
Thank you all!

So, can I say, both putting partial derivative = 0 and gradient descent can solve the problem.

Perhaps partial derivative=0 is more time-time consuming method, it can generate a formula for me to use it forever.
Then if I need to find the minimum point next time, I just need to put the training data to the formula.
This method is more convenient only when I need to solve the problem more than one time?
 
Perhaps partial derivative=0 is more time-time consuming method, it can generate a formula for me to use it forever.
Then if I need to find the minimum point next time, I just need to put the training data to the formula.
This method is more convenient only when I need to solve the problem more than one time?
Yes, that's correct. Someone who thought of that idea first already solved a and b, and the equations have been in practice for a very long time. However, you're welcome to derive them yourself.
 
Top