In this blog post, I’m going to explain how you can make predictions and forecasts using linear regression and what factors you should consider when doing so.
When to use Linear Regression?
Linear Regression is used to predict or forecast a continuous (not limited) value, such as the sales made on a day or predict temperature of a city, etc. (basically predict any continuous amount).
Linear Regression can be used to create a predictive model. If additional values get added, the model will make a prediction of a specified target variable.
The question is, how do we decide which line is the best fitting one?
What you should know about Linear Regression.
Linear Regression is also called Ordinary Least-Squares (OLS) Regression.
Linear regression model assume that your data is normally distributed. You will run into problems with skewed data.
This is fine:
This is skewed:
If your data is skewed you can try to transform the data (e.g. using LOG transformation). (Note: log can’t be applied to 0 and negative values).
Outliers in the data can significantly disrupt the outcome of a linear regression model.
How to apply Linear Regression using Python?
You can use SciKit-Learn and Python to create a Linear Regression Model.
- Explore your dataset
- Using: df.describe()
- Checkout the distribution of columns and correlation
- Using: sns.pairplot(df)
- Check out the distribution of the column you want to predict. E.g. House price
- Using: sns.distplot(df[‘Price’])
- Create a heatmap of the correlation of the columns
- Using: sns.heatmap(df.corr(), annot=True, cmap=”YlGnBu”)
Start building the actual Linear Regression Model.
- Split your data into a X-array that contains all the features and a Y-array with the target variable, which we are trying to predict (e.g. housing price)
- Using: X = df[‘feature1’, ‘feature2’, ‘Avg. Area Income’, ‘Avg. Area House Age’, ‘Avg. Area Number of Rooms’, ‘Avg. Area Number of Bedrooms’, ‘Area Population’]
- Y = df[‘Price’]
- Split your data into a training set for the model and a testing set in order to test the model.
- from sklearn.model_selection import train_test_split
- X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=101)
- Create and test your model
- from sklearn.linear_model import LinearRegression
- lm = LinearRegression()
- –> The Linear Regression Model has been trained.
- Evaluate your Linear Regression Model
- Look at the intercept, using: print(lm.intercept_)
- Look at the cooeficients, using: coeff_df = pd.DataFrame(lm.coef_,X.columns,columns=[‘Coefficient’])
- Make predictions
- Using: y_predictions = lm.predict(new_data)
Congratulation, you have now a model that allows you to predict any housing price.