Applying kMeans clustering to restaurants from GooglePlaces API (Part I)
Margo is a millennial who likes to eat out a lot. She works downtown and wants to move to an apartment that’s close to her work, food, gym and her closest friends’ houses. Let's use kMeans clustering to figure out where she should go.
Tojo Batsuuri - Lead Developer + Designer
May 8, 2019
Margo is a millennial who likes to eat out a lot. She works downtown and wants to move to an apartment that’s close to her work, food, gym and her closest friends’ houses.
On top of that she would also like to know once she’s moved into her apartment what are the best places to go to. For food she prefers ratings, distance and price level. But that’s a problem for the next article to solve. For this article, let’s find out which neighbourhood she should move to in the first place.
First of all, she works 5 days a week. Minimal commute is ideal, so her most important preference is distance from her work. We should bias the GooglePlaces API’s nearby search around her work.
If you would like to try doing this yourself. You should get a GooglePlaces API key. Otherwise you won’t be able to use the service.
GooglePlaces API gives you nearby places 20 at a time with 3 pages of results. This means you only get 60 places at a time. Which should be okay for our use. So you have to call this networking call thrice, recursively. As the JSON response contains the next_page_token, and you have to use this token to get the next page. However when you request the next page, you have to use parameter key pagetoken. Otherwise you will end up searching stack overflow for a while, like I did.
My call for example looked like this for the first call:
var url = URL(string: "https://maps.googleapis.com/maps/api/place/nearbysearch/json?key=\(PLACES_KEY)&keyword=\(query)&location=51.043536,-114.053241&radius=5000&type=restaurant")! And to get the next page like this:
url = URL(string: "https://maps.googleapis.com/maps/api/place/nearbysearch/json?key=\(PLACES_KEY)&keyword=\(query)&location=51.043536,-114.053241&radius=5000&type=restaurant&pagetoken=\(tok)")! Other parameters to be aware of here are: type, radius, location, keyword.
Type can be many things, but these are generally how google maps categorizes places. Examples are restaurant, bar, cafe etc. Check out the comprehensive list. https://developers.google.com/places/supported_types
Radius is set to 5000, meaning 5000 meters of radius from the chosen location. 5000 is the max you can set this parameter. Keyword is of course a hint, this could be something like “Indian food”. I just do an empty string for keyword.
So at this point you got 60 google places. Each time you get a batch of 20 places parse them to a GooglePlaces object and add it to an array of GooglePlace objects. To be clear, this is not GoogleMaps SDK object, just an object I declared.
When I console log their names it looks like this.
========================ARRAY COUNT IS: 20========================Murrieta's West Coast Bar & Grill
Moxie's Grill & Bar
The Keg Steakhouse + Bar
Michael's Pizza Restaurant & Delivery
Blink Restaurant & Bar
Sushi Hiro Japanese Restaurant
Sushi Bar Zipang
Pho Hoai Vietnamese Noodle House
SubwayARRAY COUNT IS: 40========================Prime Time Restaurant
Cibo 17th Ave
Papa John's Pizza
Chinese Village Restaurant
La Brezza Ristorante
Ruth's Chris Steak House
Medina Cafe & Grill
Earls Kitchen + Bar
Vin Room MissionARRAY COUNT IS: 60========================Subway
Rosso Coffee Roasters
Escoba Bistro & Wine Bar
The Living Room Restaurant
Ceili's Modern Irish Pub
CRAFT Beer Market - Calgary Downtown
Embarcadero Wine & Oyster Bar
Silver Dragon Restaurant
The Pink Pearl Restaurant
Chicago Deep Dish Pizza
Redheads Japan Cafe
A while ago I found this great algorithm for finding centroids from a set of vectors.
We will use the general definition of the vector for the second article, but for this one we will use the Cartesian product (x, y) for the vector or (lat, long). Here’s the function declaration.
func kMeans(numCenters: Int, convergeDistance: Double, points: [Vector]) -> [Vector]
For this function you have to tell it how many centres you think there are. After a short internet research, I found a square root of the total data points can be good starting point for the number of centres. But if your data is already categorized like for us. Then we just need a single centroid. So numCenters is 1, for this case, convergenceDistance I tried 0.1 and 0.01, the latter giving a better centroid in practice, and all the data points into points(your places’ locations).
Before we move on if you just use the square root of the total number of data points. You get something like this.
Square root of 60 is 7.74… floored is 7, Int. Therefore 7 centroids and 7 colours. I found using yellow map marker somehow makes this really ugly poop colour. If anyone knows how to get a real yellow map marker, please let me know. Thanks.
The algorithm naturally finds clusters and puts a centroid in the middle. The transparent ones are actual places and opaque ones are the centroids. But this is just finding physically close restaurants.
We want to get even more useful information from this.
The location marker on the map. The blue circle is her work. Where the nearby search is biased around. Now there are the friends and gym. She has 3 close friends. So you create an array of 3 vectors. Her gym has a single location, which she tries to go at least 6 times a week. So that’s also pretty important.
First centroid is work, red.
Second is gym, green.
Third is social circle, purple. You will notice that her friends places are transparent and their centroid is the opaque purple point in the middle.
Friends, gym, work.
And finally food. After being honest with herself she realized she really likes going to bars to meet colleagues and friends. It’s not so much she likes eating our just anywhere. She really prefers going out to bars.
These are the bars with the highest rating that are also close to her work.
Putting it all together. We get the final average location. In purple below.
Blue is where her friends are and red is her work. Green is the gym and cyan is where the best bars are. And the answer to all of this is the purple in the middle.
So that’s it. If Margo wanted to optimize this even further, I would say the purple area covers about 5, 6 viable neighbourhoods. She could rank all of them using a priority queue. Which calculates priority on different weights of price, lease commitment and neighbourhood.
Here’s the summary for functions and classes I used:
I hope you took away some useful strategies for solving some problems. Stay tuned for the next article in the series, where I will integrate priority queues into the clustering. This will allow us to cluster places not just on location but also on different categories like: price, rating, location and type etc.