Boston & Seattle Airbnb Homes

3 min readDec 17, 2020

A data based approach using Kaggle open data

Introduction

Have you ever asked yourself how does AirBnB suggest a price for listed homes?

Well, we are here to discover how it works, First let’s discuss how AirBnB list home feature works. And for that I will make it easy for you and instead of going to AirBnB.com and create an account then listing a home. I will provide you all steps as slideshow below:

Steps to list a home in AirBnB

Then, after you listed your home, you will receive an email tell you How Smart Pricing works

Home Distribution

Now, let’s have a look on how homes are distributed on both Boston & Seattle:

Here for Boston’s homes distribution, we can see that there are a lot of listed homes in Boston, More homes in the Center of Boston while less homes around the edges. This may have an affect on the prices.

Here for Seattle homes distribution, we can say the same as what we said for Boston, but with less homes in the Center of Seattle. Also this may has an affect on prices.

Q1. Will the number of homes in certain neighborhood affects the prices?

I think we can use clustering model to answer this question later in the Udacity course.

Features Correlation:

Here in this Heatmap the right side bar is a legend the 0.00 means no correlation between the features, and 1.00 which is the maximum value means there is a strong correlation between the features, while -1.00 which is the minimum value means there is a strong negative correlation between the features.

We will focus on price, since we are looking for those features correlated with it. Here we can say that ‘accommodates’ , ‘bedrooms’ have slight positive correlation with price, while ‘number_of_reviews’ , and ‘review_per_month’ have negative correlation with price.

Predicting Prices

First, Be aware that activating Smart Pricing is optional, So when we predicting the prices there will be number of mistakes for those whom didn’t activate Smart Pricing.

Conclusions

After we fit the model into test and train data, we got very bad R square scores, and I think this is a curse of dimensionality situation.

Can other Machine Learning models get better R square scores?

Can the deleted columns help to make well prediction if we modify them?