# Predicting House Prices in Python using Linear Regression

## Hello World,

Hi everyone, this is the second blog in the Machine Learning series. In this we’re going to predict house prices using Linear Regression. So let’s get started:-

First and foremost* ,*

*What is Linear Regression?*

*What is Linear Regression?*

A model that assumes a **linear** relationship between the input variables (x) and the single output variable (y).

## Math behind it:-

Eq. of line: y = mx+c

Now to find y = we only require **x **as input if we know **m, c** but we don’t in linear regression. So for **(hθ)price = θ₀*(xθ)sqft_living + θ₁**

Now we only need to find **θ **so we will generte first random **θ **then find error (J(cost function)) on it

J(cost function)= **1/2m∑**(**hθ - xθ**)

Now we will minimize it using differential calculus by a algorithm called *Gradient Descent*

**θ₁ ≔ θ₁ - α[ 1/m∑(hθ — xθ)xθ ] ( α = descent rate )**

After this computing this, we will get the best **θ **possible, and now we can just plug **x **input and get the house price as result.

So this is the basic idea how Linear Regression works. Above is one-dimensional (1 input) but in real we use multi-dimensional (multiple inputs).

## Let’s begin the code:-

For this we’ll use this *dataset from Kaggle**.*

First we’ll import modules and data and find if there are any null or zeros in data

We’ll get **no **null values and the following output.

| date | 0 |

| price | 49 |

| bedrooms | 2 |

| bathrooms | 2 |

| sqft_living | 0 |

| sqft_lot | 0 |

| floors | 0 |

| waterfront | 4567 |

| view | 4140 |

| condition | 0 |

| sqft_above | 0 |

| sqft_basement | 2745 |

| yr_built | 0 |

| yr_renovated | 2735 |

| street | 0 |

| city | 0 |

| statezip | 0 |

| country | 0 |

We will remove the** 0 price** values as they affect our data badily because we don’t need the price of the houses which are not sold yet. So we replace them with **prices of houses having similar features.**

We will also remove **0 values **of other features by looking over graph

Now we will remove **outliers **as they affect our model *badily*. We can remove them bymany ways but I am going do this with** ****Z Score algorithm**.

Now will we remove outliers of *other features* manually by looking over plots of features.

Now we will remove discrete values of features with doesn’t make sense like a house with **a really low price but having 8 bedrooms**. We will do this too by looking over graph manually.

Now we will plot* **heatmap**.*

NOW after all of above code. We have removed **outliers **and **refined our data.**

So let’s create our model now.

# We get the following score:-

explained_variance_score : **0.7263689983438038**

max_error : **1265577.6392805874**

mean_absolute_error : **93320.21416728517**

mean_squared_error : **25954205385.90649**

mean_squared_log_error : **0.06420609977688856**

mean_absolute_percentage_error : **0.19477983081040492**

median_absolute_error : **55069.36503234797**

r2_score : **0.726338460452626**

Our model has **72%** r2_score which is pretty decent.

The code ( with Jupyter notebook ) is on my Github here.

**Thanks for reading 😄**

And, clap 👏 if this was a good read. Enjoy!