Restaurant Recommendation System
Group Project
Introduction
The problem we are trying to solve is that we want to help people efficiently pick a restaurant based on their interests such as taste or mood. We believe that having a recommendation system in Philadelphia that suggests restaurants would save people a lot of time and energy. Picking a place to eat can be challenging since there are so many different restaurants in the Philadelphia area. This being the case, having a utility such as this restaurant recommendation system can help with several factors of convenience such as cost + time efficiency, distance, and as previously stated, energy. A question to be answered through completing this project is, can this something that can be accomplished through building a recommendation system? What all is needed in terms of modeling in order to achieve this goal?
Data
The data we are pulling is from a website called Yelp. Yelp is an online platform that allows for users to rate and review restaurants, their reviews are made public so other users can read them and make judgments. Yelp provides data from their service through the “Yelp Open Dataset”. We are using filtered data that is specific to large metropolitan areas, this filtered data was obtained through GitHub user, unclebrod. When users on Yelp review a restaurant it is in the form of a 1-5 rating that is then calculated into an average for the restaurant. We plan to use this average and the number of reviews to do our analysis. This data is already in a simple to use format so no extensive cleaning or scraping will need to be done.
Methods
Pre-processing:
The Yelp dataset was already heavily cleaned and almost all of the data was relevant to our application.
First we decided to filter the city to only be Philadelphia, PA.
Then we decided to use only the name of restaurant, stars of the restaurant, distance of the restaurant (latitude & longitude), city, and categories (type of restaurant)
Using user latitude & longitude location, we can calculate the distance between the user and the restaurant, which will be a factor in our recommendation.
Visualizations:
We wanted to visualize the data so we can get a better understanding by creating bar charts to see distributions and lists to see the restaurants in an orderly manner.




Stars: Plotting the position of the restaurants over a map of Philadelphia, we can get a good idea of where most of the restaurants are and their associated rating.

Distance: Since we have the latitude and longitude of each restaurant, we can use that to calculate the distance between the user and each restaurant. The closer the restaurant is to the user, the more likely it will be to show up higher on the list of recommendations.

Modeling
We decided to model using K-means clustering to see what kinds of features about the specific restaurants people tend to enjoy which allows us to see their interests about their food preferences.
Based on our model we found that there is no correlation between the amount of reviews a restaurant has and how far away it is. Usually you would think more people would be reviewing restaurants in a closer proximity to them but it turns out that is not the case.

We also saw that the model has a correlation between stars and review count. The highest ratings were found when the review count was over 2000.

Our final cluster after choosing 5 clusters:

Evaluation
Since we ran a classification model, we evaluated on the accuracies, F1 scores, etc. using project 3 which was about regression. Unfortunately, our R-Squared value for our regression model is dismally low, which may have something to do with the lack of features and subjectivity of the data.

Accurate to Yelp’s review:

Evaluating performance for this model is difficult because its predictions are all subjective and take into account many different variables such as where the ‘map’ is drawn when looking at Google or Yelp, however, our model consistently gave highly reviewed, highly rated restaurants all within a reasonable distance from the user.
Storytelling and Conclusion
Our initial goal we were trying to solve was that we wanted to help people effectively choose a restaurant based on their interests of taste or mood. We didn’t obtain our initial goal because we weren’t able to utilize the categories of the restaurants (mexican, fast food, chinese, american, etc) to see what interests someone had in their food and restaurant choices. Future steps in mind that we could improve on are implementing the food categories successfully within our project as well as incorporating the distance and reviews in relation to the categories. Some other insights we gained throughout our project include how difficult it is to make use of the different categories, because without that we were not able to complete our initial goal. Lastly, we tried using a regression model to make predictions, but the results we obtained were very poor. K-means clustering gave us the best results and was very closely aligned with what we saw on Yelp and Google. If we had a little more time to work on the project, we think a Random Forest Classifier would work even better.
Impact
There are numerous ways that our project will have an impact on communities and users alike. Primarily we will be impacting restaurants and other small businesses based on the recommendations that our models make. Based on the popularity of our model, it is possible that there could be a significant change in the amount of customers that a restaurant might see, whether that be an increase or decrease. Our project will also be impacting the users of the restaurant recommender system. There is potential for both positive and negative impacts related to our recommendations. A positive impact for users would be our model recommending a restaurant that the user would enjoy that is also nearby to them. A potential negative impact would be our model recommending a restaurant that is either far away from them and/or one that they will not enjoy.
We must weigh the ethical concerns of implementing a model like this because of its potential to harm or negatively affect a business that does not deserve it. There is potential for our model to unfairly affect a business by not recommending it as much as it should, this could impact the restaurant's revenue by decreasing the amount of customers it receives. In order to prevent this ethical harm from occurring we must do extensive testing of our model before making it public.
Code + Data + References
References: