Least Squares Regression Line

Education concept. Student studying and brainstorming campus con

Introduction

A correlation coefficient provides you with a measure of the relationship between two variables. If it is a linear correlation then it can be written as a linear equation or as a straight line on the scatter diagram. 

Before any calculations are done it is important to plot the dependent variable on the vertical axis and the independent variables on the horizontal axis. 

The table below shows the ages of 12 people and how much pocket money they receive:

\begin{array}{|c|c|c|c|c|c|c|c|c|c|c|c|c|} \hline \text { Age } & 6 & 7 & 9 & 10 & 11 & 12 & 13 & 14 & 15 & 15 & 16 & 16 \\ \hline \text { Money (f) } & 3.5 & 2 & 3 & 4 & 4 & 4.5 & 4.5 & 4 & 4.5 & 3.5 & 4.5 & 6 \\ \hline \end{array}

It can be calculated that the mean age, \bar{x}=12 and the mean amount of pocket money,\bar{y}=4, so the mean point is (12,4).

Least Squares Regression Line - How Points Are Plotted

The points are plotted in the graph below and the mean value is highlighted


The gradient needs to be found and this can be done by using the equation (y-\bar{y})=b(x-\bar{x})

Or 

y=b x+(\bar{y}-b \bar{x})

Where b is the gradient and (\bar{y}-b \bar{x}) is the intercept on the y axis. 

Let the distance from each point to the line be d_1, d_2

d1, d2, etc and these will take positive and negative values depending on whether they are above or below the line of best fit. 

The values will be squared to cancel out any negatives so what is needed is that \sum d_i^2 is as small as possible. 

The formula for the gradient which will make \sum d_i^2 di2 is as small as possible is given by \therefore y-\bar{y}=\frac{s_{x y}}{s_{x x}}(x-\bar{x})

Where S_{x y}=\sum x_i y_i-n \overline{x y} \text { and } S_{x x}=\sum x_i^2-n \bar{x}^2

Least Squares Regression Line - Summary

This can be summarised as:

The least squares regression line is y=a+b x

Where:

<span style="font-weight: 400;">Where

And:

b=\frac{S_{x y}}{S_{x x}}=\frac{\sum x_i y_i-n \overline{x y}}{\sum x_i^2-n \bar{x}^2}
A teacher smiling at camera in classroom

Example

The following data shows information relating to time and the concentration of a particular chemical:

\begin{array}{|l|c|c|c|c|c|c|} \hline \text { Time, } \mathrm{x} \text {, Hours } & 0 & 1 & 2 & 3 & 4 & 5 \\ \hline \text { Concentration, } \mathrm{y} & 2.4 & 4.3 & 5.2 & 6.8 & 9.1 & 11.8 \\ \hline \end{array}

a) Find the equation of the regression line of y on x

b) Illustrate the data and your regression line on a scatter diagram

c) Estimate the concentration of the chemical after a) 3.5 hours and b) 10 hours

Solution

a) 

\begin{array}{|c|c|c|c|} \hline \mathbf{x} & \mathbf{y} & \mathbf{x}^{\mathbf{2}} & \mathbf{x y} \\ \hline 0 & 2.4 & 0 & 0 \\ \hline 1 & 4.3 & 1 & 4.3 \\ \hline 2 & 5.2 & 4 & 10.4 \\ \hline 3 & 6.8 & 9 & 20.4 \\ \hline 4 & 9.1 & 16 & 36.4 \\ \hline 5 & 11.8 & 25 & 59 \\ \hline \mathbf{1 5} & \mathbf{3 9 . 6} & \mathbf{5 5} & \mathbf{1 3 0 . 5} \\ \hline \end{array}

 

\begin{aligned} & \bar{x}=\frac{\sum x}{n}=\frac{15}{6}=2.5 \text { and } \bar{y}=\frac{\sum y}{n}=\frac{39.6}{6}=6.6 \\ & \therefore S_{x x}=\sum x^2-n \bar{x}^2=55-6 \times 2.5^2=17.5 \\ & \therefore S_{x y}=\sum x y-n \overline{x y}=130.5-6 \times 2.5 \times 6.6=31.5 \\ & \therefore b=\frac{S_{x y}}{S_{x x}}=\frac{\sum x y-n \overline{x y}}{\sum x^2-n \bar{x}^{-2}}=\frac{31.5}{17.5}=1.8 \end{aligned}

 

So the least squares regression line is given by:

 

\begin{gathered} y-\bar{y}=b(x-\bar{x}) \\ \therefore y-6.6=1.8(x-25) \\ \therefore y=2.1+1.8 x \end{gathered}

 

b) The data and line of regression is shown:


c) 

\begin{aligned} & x=3.5 ; y=2.1+1.8 \times 3.5=8.4 \\ & x=10 ; y=2.1+1.8 \times 10=20.1 \end{aligned}

What is important to remember is that even though a lot of calculations have been shown in this article, you are expected to be able to find all these with the aid of a calculator. 

The calculations here are to show what processes are happening and even though you will be using a calculator, it is still important to understand how the calculations for a least squares regression line are calculated. 

If you, or your parents would like to find out more, please just get in touch via email at info@exam.tips or call us on 0800 689 1272

New to exam.tips?