In this blog post, I’m going to explain how you can make predictions and forecasts using linear regression and what factors you should consider when doing so.

### When to use Linear Regression?

Linear Regression is used to predict or forecast a continuous (not limited) value, such as the sales made on a day or predict temperature of a city, etc. (basically predict any continuous amount).

Linear Regression can be used to create a predictive model. If additional values get added, the model will make a prediction of a specified target variable.

The question is, how do we decide which line is the best fitting one?

### What you should know about Linear Regression.

Linear Regression is also called Ordinary Least-Squares (OLS) Regression.

Linear regression model assume that your **data is normally distributed**. You will run into problems with skewed data.

This is fine:

This is skewed:

If your data is skewed you can try to transform the data (e.g. using LOG transformation). (Note: log can’t be applied to 0 and negative values).

Outliers in the data can significantly disrupt the outcome of a linear regression model.

### How to apply Linear Regression using Python?

You can use SciKit-Learn and Python to create a Linear Regression Model.

**Explore**your dataset- Using:
*df.describe()*

- Using:

- Checkout the
**distribution**of columns and correlation- Using:
*sns.pairplot(df)*

- Using:

- Check out the
**distribution**of the column you want to predict. E.g. House price- Using:
*sns.distplot(df[‘Price’])*

- Using:

- Create a heatmap of the
**correlation**of the columns- Using:
*sns.heatmap(df.corr(), annot=True, cmap=”YlGnBu”)*

- Using:

**Start building** the actual Linear Regression Model.

**Split**your data into a*X-array*that contains all the**features**and a*Y-array*with the**target variable**, which we are trying to predict (e.g. housing price)- Using:
*X = df[‘feature1’, ‘feature2’, ‘Avg. Area Income’, ‘Avg. Area House Age’, ‘Avg. Area Number of Rooms’, ‘Avg. Area Number of Bedrooms’, ‘Area Population’]* *Y = df[‘Price’]*

- Using:

- Split your data into a
**training set**for the model and a**testing set**in order to test the model.- Using:
*from sklearn.model_selection import train_test_split**X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=101)*

- Using:
**Create and test**your model- Using:
*from sklearn.linear_model import LinearRegression**lm = LinearRegression()**lm.fit(X_train,y_train)*

- Using:
- –> The Linear Regression Model has been trained.

**Evaluate**your Linear Regression Model- Look at the intercept, using:
*print(lm.intercept_)* - Look at the cooeficients, using:
*coeff_df = pd.DataFrame(lm.coef_,X.columns,columns=[‘Coefficient’])*

- Look at the intercept, using:
*coeff_df*

**Make predictions**- Using:
*y_predictions = lm.predict(new_data)*

- Using:

**Congratulation**, you have now a model that allows you to predict any housing price.