Menu Close

I-have-a-question-I-am-unsure-how-this-is-done-because-I-have-never-learnt-it-How-do-you-determine-the-line-of-best-fit-




Question Number 5117 by FilupSmith last updated on 14/Apr/16
I have a question. I am unsure how this  is done because I have never learnt it.    How do you determine the line of best fit?
$$\mathrm{I}\:\mathrm{have}\:\mathrm{a}\:\mathrm{question}.\:\mathrm{I}\:\mathrm{am}\:\mathrm{unsure}\:\mathrm{how}\:\mathrm{this} \\ $$$$\mathrm{is}\:\mathrm{done}\:\mathrm{because}\:\mathrm{I}\:\mathrm{have}\:\mathrm{never}\:\mathrm{learnt}\:\mathrm{it}. \\ $$$$ \\ $$$$\mathrm{How}\:\mathrm{do}\:\mathrm{you}\:\mathrm{determine}\:\mathrm{the}\:\mathrm{line}\:\mathrm{of}\:\mathrm{best}\:\mathrm{fit}? \\ $$
Commented by Yozzii last updated on 14/Apr/16
Least squares method is one way.  Given a set of n points (x_i ,y_i ), suppose  that the line of bestfit has the form  y=mx+c. Then, for each x_i , we get  the result y_j =mx_i +c. The least squares  method requires that for the regression line  y on x, the quantity Q must be minimised  where Q=Σ_(i=1) ^n (y_i −y_j )^2 =Σ_(i=1) ^n (y_i −mx_i −c)^2   This means that we aim to minimise  the squared distances between a given y_i  and  the point on the line y_j  corresponding to  x_i . Q is a function of two variables m  and c since we assume that (x_i ,y_i ) are  known. We can then employ techniques  of multivariable calculus to find m and  c. Since Q is of a quadratic form its  stationary value is a minimum one;  in 3D space, the locus of points (m,c,Q) is a bowl surface  where Q≥0.   ⇒(∂Q/∂m)=Σ_(i=1) ^n (−2x_i )(y_i −mx_i −c)  (∂Q/∂m)=2m(Σ_(i=1) ^n x_i ^2 )+2c(Σ_(i=1) ^n x_i )−2(Σ_(i=1) ^n x_i y_i )  and (∂Q/∂c)=Σ_(i=1) ^n (−2)(y_i −mx_i −c)  (∂Q/∂c)=2m(Σ_(i=1) ^n x_i )+2c(Σ_(i=1) ^n 1)−2Σ_(i=1) ^n y_i   or (∂Q/∂c)=2m(Σ_(i=1) ^n x_i )+2cn−2(Σ_(i=1) ^n y_i )  At the stationary point, (∂Q/∂m)=0 and (∂Q/∂c)=0.  You get the following result for m  from these two equations by eliminating c.  m=((nΣ_(i=1) ^n x_i y_i −(Σ_(i=1) ^n x_i )(Σ_(i=1) ^n y_i ))/(nΣ_(i=1) ^n x_i ^2 −(Σ_(i=1) ^n x_i )^2 ))  It can shown that (x^− ,y^− )=(((Σ_(i=1) ^n x_i )/n),((Σ_(i=1) ^n y_i )/n))  lies on the line of best fit, so c can be  found from c=y^− −mx^− . The result  of y=mx+c is the least squares line of  best fit.  (x^− ,y^− ) is the centroid of all the points  (x_i ,y_i ). m has another form.  m=((Σ_(i=1) ^n x_i y_i −nx^− y^− )/(Σ_(i=1) ^n x_i ^2 −n(x^− )^2 )).
$${Least}\:{squares}\:{method}\:{is}\:{one}\:{way}. \\ $$$${Given}\:{a}\:{set}\:{of}\:{n}\:{points}\:\left({x}_{{i}} ,{y}_{{i}} \right),\:{suppose} \\ $$$${that}\:{the}\:{line}\:{of}\:{bestfit}\:{has}\:{the}\:{form} \\ $$$${y}={mx}+{c}.\:{Then},\:{for}\:{each}\:{x}_{{i}} ,\:{we}\:{get} \\ $$$${the}\:{result}\:{y}_{{j}} ={mx}_{{i}} +{c}.\:{The}\:{least}\:{squares} \\ $$$${method}\:{requires}\:{that}\:{for}\:{the}\:{regression}\:{line} \\ $$$${y}\:{on}\:{x},\:{the}\:{quantity}\:{Q}\:{must}\:{be}\:{minimised} \\ $$$${where}\:{Q}=\underset{{i}=\mathrm{1}} {\overset{{n}} {\sum}}\left({y}_{{i}} −{y}_{{j}} \right)^{\mathrm{2}} =\underset{{i}=\mathrm{1}} {\overset{{n}} {\sum}}\left({y}_{{i}} −{mx}_{{i}} −{c}\right)^{\mathrm{2}} \\ $$$${This}\:{means}\:{that}\:{we}\:{aim}\:{to}\:{minimise} \\ $$$${the}\:{squared}\:{distances}\:{between}\:{a}\:{given}\:{y}_{{i}} \:{and} \\ $$$${the}\:{point}\:{on}\:{the}\:{line}\:{y}_{{j}} \:{corresponding}\:{to} \\ $$$${x}_{{i}} .\:{Q}\:{is}\:{a}\:{function}\:{of}\:{two}\:{variables}\:{m} \\ $$$${and}\:{c}\:{since}\:{we}\:{assume}\:{that}\:\left({x}_{{i}} ,{y}_{{i}} \right)\:{are} \\ $$$${known}.\:{We}\:{can}\:{then}\:{employ}\:{techniques} \\ $$$${of}\:{multivariable}\:{calculus}\:{to}\:{find}\:{m}\:{and} \\ $$$${c}.\:{Since}\:{Q}\:{is}\:{of}\:{a}\:{quadratic}\:{form}\:{its} \\ $$$${stationary}\:{value}\:{is}\:{a}\:{minimum}\:{one}; \\ $$$${in}\:\mathrm{3}{D}\:{space},\:{the}\:{locus}\:{of}\:{points}\:\left({m},{c},{Q}\right)\:{is}\:{a}\:{bowl}\:{surface} \\ $$$${where}\:{Q}\geqslant\mathrm{0}.\: \\ $$$$\Rightarrow\frac{\partial{Q}}{\partial{m}}=\underset{{i}=\mathrm{1}} {\overset{{n}} {\sum}}\left(−\mathrm{2}{x}_{{i}} \right)\left({y}_{{i}} −{mx}_{{i}} −{c}\right) \\ $$$$\frac{\partial{Q}}{\partial{m}}=\mathrm{2}{m}\left(\underset{{i}=\mathrm{1}} {\overset{{n}} {\sum}}{x}_{{i}} ^{\mathrm{2}} \right)+\mathrm{2}{c}\left(\underset{{i}=\mathrm{1}} {\overset{{n}} {\sum}}{x}_{{i}} \right)−\mathrm{2}\left(\underset{{i}=\mathrm{1}} {\overset{{n}} {\sum}}{x}_{{i}} {y}_{{i}} \right) \\ $$$${and}\:\frac{\partial{Q}}{\partial{c}}=\underset{{i}=\mathrm{1}} {\overset{{n}} {\sum}}\left(−\mathrm{2}\right)\left({y}_{{i}} −{mx}_{{i}} −{c}\right) \\ $$$$\frac{\partial{Q}}{\partial{c}}=\mathrm{2}{m}\left(\underset{{i}=\mathrm{1}} {\overset{{n}} {\sum}}{x}_{{i}} \right)+\mathrm{2}{c}\left(\underset{{i}=\mathrm{1}} {\overset{{n}} {\sum}}\mathrm{1}\right)−\mathrm{2}\underset{{i}=\mathrm{1}} {\overset{{n}} {\sum}}{y}_{{i}} \\ $$$${or}\:\frac{\partial{Q}}{\partial{c}}=\mathrm{2}{m}\left(\underset{{i}=\mathrm{1}} {\overset{{n}} {\sum}}{x}_{{i}} \right)+\mathrm{2}{cn}−\mathrm{2}\left(\underset{{i}=\mathrm{1}} {\overset{{n}} {\sum}}{y}_{{i}} \right) \\ $$$${At}\:{the}\:{stationary}\:{point},\:\frac{\partial{Q}}{\partial{m}}=\mathrm{0}\:{and}\:\frac{\partial{Q}}{\partial{c}}=\mathrm{0}. \\ $$$${You}\:{get}\:{the}\:{following}\:{result}\:{for}\:{m} \\ $$$${from}\:{these}\:{two}\:{equations}\:{by}\:{eliminating}\:{c}. \\ $$$${m}=\frac{{n}\underset{{i}=\mathrm{1}} {\overset{{n}} {\sum}}{x}_{{i}} {y}_{{i}} −\left(\underset{{i}=\mathrm{1}} {\overset{{n}} {\sum}}{x}_{{i}} \right)\left(\underset{{i}=\mathrm{1}} {\overset{{n}} {\sum}}{y}_{{i}} \right)}{{n}\underset{{i}=\mathrm{1}} {\overset{{n}} {\sum}}{x}_{{i}} ^{\mathrm{2}} −\left(\underset{{i}=\mathrm{1}} {\overset{{n}} {\sum}}{x}_{{i}} \right)^{\mathrm{2}} } \\ $$$${It}\:{can}\:{shown}\:{that}\:\left(\overset{−} {{x}},\overset{−} {{y}}\right)=\left(\frac{\underset{{i}=\mathrm{1}} {\overset{{n}} {\sum}}{x}_{{i}} }{{n}},\frac{\underset{{i}=\mathrm{1}} {\overset{{n}} {\sum}}{y}_{{i}} }{{n}}\right) \\ $$$${lies}\:{on}\:{the}\:{line}\:{of}\:{best}\:{fit},\:{so}\:{c}\:{can}\:{be} \\ $$$${found}\:{from}\:{c}=\overset{−} {{y}}−{m}\overset{−} {{x}}.\:{The}\:{result} \\ $$$${of}\:{y}={mx}+{c}\:{is}\:{the}\:{least}\:{squares}\:{line}\:{of} \\ $$$${best}\:{fit}. \\ $$$$\left(\overset{−} {{x}},\overset{−} {{y}}\right)\:{is}\:{the}\:{centroid}\:{of}\:{all}\:{the}\:{points} \\ $$$$\left({x}_{{i}} ,{y}_{{i}} \right).\:{m}\:{has}\:{another}\:{form}. \\ $$$${m}=\frac{\underset{{i}=\mathrm{1}} {\overset{{n}} {\sum}}{x}_{{i}} {y}_{{i}} −{n}\overset{−} {{x}}\overset{−} {{y}}}{\underset{{i}=\mathrm{1}} {\overset{{n}} {\sum}}{x}_{{i}} ^{\mathrm{2}} −{n}\left(\overset{−} {{x}}\right)^{\mathrm{2}} }. \\ $$

Leave a Reply

Your email address will not be published. Required fields are marked *