Understanding what our data says about reviews

Forming Hypotheses

When we first gathered the data, we suspected that local people will judge restaurants differently depending on location of origin. We also supposed that local will rate restaurants less harshly because they will be more likely to have an association with either the culture or the people owning or working at the restaurants. Conversely, non-locals will judge restaurants more harshly, because they are less likely to have less context for the local cuisine, restaurant neighborhoods, and people associated with the establishment. Using both Yelp and TripAdvisor data independently, we tested the following hypotheses:

  1. There exists a difference in the mean rating between local and non-local reviews
  2. The overall user rating is not different across websites
  3. A user's ratings will have some predictive power over whether a user is a "local" or an "out-of-towner", or in other terms, users are biased in their reviews in part because of their distance to restaurants.

Testing Our Hypotheses

Hypothesis One: Mean Rating Differential

We averaged the ratings of all reviews for locals and visitors, and found that, unmistakably, locals rate restaurants lower than visitors. Why might this be? Maybe it's the classic "grass is always greener" concept. Reviews alone probably will not be able to tell us. Below are the rating distributions for Yelp and TripAdvisor. Notice that for both websites, TripAdvisor especially, visitors give significantly more four and five star reviews than locals. Our hypothesis turned out to be true; locals and out-of-towners were not created equally... In the review sense. The fact that locals consistently give lower ratings is somewhat surprising. One might think that locals would give more generous ratings in an effort to support local businesses, but it seems more likely that they rate more honestly, or that the "same-old-place" isn't as great in their eyes as it is in the eyes of visitors. Maybe restaurants should rethink the outlook that their loyal locals are the most important customers.

Hypothesis Two: Rating Differential Across Websites

We took the average ratings over Yelp and TripAdvisor and compared them. Surprisingly, TripAdvisor ratings were on 4.1 stars on average, whereas Yelp's were 3.7. We were wrong in thinking that the two sites would yield similar ratings, but these findings are still interesting. Both of these numbers are higher than what we would expect to be the average - a three star rating, but TripAdvisor reviewers seem to overrate everything. This further points to a "grass is always greener" conclusion about reviewers. Below is a table of average ratings for Yelp and TripAdvisor, as well as graphs showing the average rating over the past five years. Notice the odd slight upward trend in reviews over time. Is that because restaurants are getting better, or because people are becoming nicer?

Hypothesis Three: Predictive Power

We have observed that visitors have a tendency to give significantly higher ratings than locals. However, this one attribute alone is not enough for us to predict whether a user is a local or not a local. Any model that we used for predictions failed. The opposite relationship might have a better shot at success: given a local or visitor reviewer, can you predict their rating? This is also, not very accurate, however if we use the local feature in conjunction with others, we might be able to come closer to a good prediction of rating.

Take a Step Back...

Keep digging!