Introduction to Artificial Intelligence

Unit: Machine Learning

Lesson: Linear Regression

Understanding Linear Regression

How Does Linear Regression Work?

The Least Squares Method

Gradient Descent for Linear Regression (Optional)

Numerical Example

Extension to Multiple Features (Optional)

Linear Regression Notebook (Optional)

Linear Regression Quiz

>Courses>Introduction to Artificial Intelligence>Machine Learning>Linear Regression>The Least Squares Method

The Least Squares Method

When we say we want the "best" line, what do we actually mean? Think about it this way: for each student in our data, our line makes a prediction, and we can measure how far off that prediction is from the real score.

For example, if:

A student studied 3 hours and got 82%
Our line predicts 81% for 3 hours of study
The error (or difference) is 1%

The Squared Error

We care about all errors, whether we predicted too high or too low. That's why we square these differences. In math terms, for each student $i$ :

$\text{error}_i = (y_i - f(x_i))^2$

$\text{error}_i = (y_i - (\beta_0 + \beta_1x_i))^2$

Where:

$y_i$ is the actual score
$f(x_i)$ is our predicted score
The square $^2$ makes all errors positive

The Total Error

To find the best line, we want to minimize the average of all these squared errors. We write this as:

$R = \frac{1}{2n}\sum_{i=1}^n (y_i - (\beta_0 + \beta_1x_i))^2$

Don't let this formula scare you! It just means:

Take each prediction error
Square it
Add up all the squared errors
Take the average

To find the best values for $\beta_0$ and $\beta_1$ , we need to find where this error $R$ is smallest.

We usually use Gradient Descent to do that.