Questions about linear lines of best fit

Brian Demong · Aug 24, 2022

Why do we find the least-squares instead of the least-absolutes? That is, what is the advantage of squaring the distances over finding the absolute values of the distances?

One effect of squaring, besides making all the values positive: it increases the "weight" of larger distances, and even decreases the "weight" of distances between -1 and 1. But why is that valuable/necessary?

Thanks in advance,
_Brian

blamocur · Aug 25, 2022

This is a very good question! The answer is that both ways are valid. In fact, both are special cases of a more general

L_p

norms (https://en.wikipedia.org/wiki/Norm_(mathematics)#p-norm), with

L_2

corresponding to least-squares and

L_1

to the absolutes.

The popularity of least-squares in engineering might be explained by the availability of nice closed-formed methods of optimization, which are often reduced to well-developed methods from linear algebra. But there are cases where "least-absolute" methods lead to better results in practice, e.g. some cases of sparse optimization (https://en.wikipedia.org/wiki/Sparse_approximation).

P.S. Popular preference for least square methods in engineering often reminds me of an old joke: https://en.wikipedia.org/wiki/Streetlight_effect.

BigBeachBanana · Aug 25, 2022

Brian Demong said:
Why do we find the least-squares instead of the least-absolutes? That is, what is the advantage of squaring the distances over finding the absolute values of the distances?

One effect of squaring, besides making all the values positive: it increases the "weight" of larger distances, and even decreases the "weight" of distances between -1 and 1. But why is that valuable/necessary?

Thanks in advance,
_Brian

Good questions indeed. You're correct that using squares has a much higher influence on outliers compared to the absolute value. But this isn't always bad depending on your application.

A mathematical answer is that the function

x^2

is differential at

x=0

, while

|x|

is not. If you've studied calculus, one can easily optimize a parabola by finding the derivative and equating it to 0. So it has a nice analytical solution. On the other hand, optimizing

|x|

often requires some sort of numerical approximation.

Furthermore, the Variance is also the second central moment so it has a lot of nice statistical properties that follow some well-known distribution. Moreover, the Variance can be derived with ease in most cases with the help of the Moment Generating Function.

In short, absolute value error is better in some cases, but usually, we favour squared error because of the ease of computation. Ultimately, it is your call to justify your method and defend your work. By the end of the day, statistics is the science of summarizing data, there's no one correct answer.

Questions about linear lines of best fit

Brian Demong

New member

blamocur

Elite Member

BigBeachBanana

Senior Member