Machine Learning

Applying kMeans clustering to restaurants from GooglePlaces API (Part I)

Margo is a millennial who likes to eat out a lot. She works downtown and wants to move to an apartment that’s close to her work, food, gym and her closest friends’ houses. Let's use kMeans clustering to figure out where she should go.

Tojo Batsuuri - Lead Developer + Designer

May 8, 2019

Margo is a millennial who likes to eat out a lot. She works downtown and wants to move to an apartment that’s close to her work, food, gym and her closest friends’ houses.

On top of that she would also like to know once she’s moved into her apartment what are the best places to go to. For food she prefers ratingsdistance and price level. But that’s a problem for the next article to solve. For this article, let’s find out which neighbourhood she should move to in the first place.

First of all, she works 5 days a week. Minimal commute is ideal, so her most important preference is distance from her work. We should bias the GooglePlaces API’s nearby search around her work.

If you would like to try doing this yourself. You should get a GooglePlaces API key. Otherwise you won’t be able to use the service.

GooglePlaces API gives you nearby places 20 at a time with 3 pages of results. This means you only get 60 places at a time. Which should be okay for our use. So you have to call this networking call thrice, recursively. As the JSON response contains the next_page_token, and you have to use this token to get the next page. However when you request the next page, you have to use parameter key pagetoken. Otherwise you will end up searching stack overflow for a while, like I did.

My call for example looked like this for the first call:

var url = URL(string: "\(PLACES_KEY)&keyword=\(query)&location=51.043536,-114.053241&radius=5000&type=restaurant")! And to get the next page like this:

url = URL(string: "\(PLACES_KEY)&keyword=\(query)&location=51.043536,-114.053241&radius=5000&type=restaurant&pagetoken=\(tok)")! Other parameters to be aware of here are: typeradiuslocationkeyword.

Type can be many things, but these are generally how google maps categorizes places. Examples are restaurantbarcafe etc. Check out the comprehensive list.

Radius is set to 5000, meaning 5000 meters of radius from the chosen location. 5000 is the max you can set this parameter. Keyword is of course a hint, this could be something like “Indian food”. I just do an empty string for keyword.

So at this point you got 60 google places. Each time you get a batch of 20 places parse them to a GooglePlaces object and add it to an array of GooglePlace objects. To be clear, this is not GoogleMaps SDK object, just an object I declared.

When I console log their names it looks like this.

========================ARRAY COUNT IS: 20========================Murrieta's West Coast Bar & Grill Fort Calgary Charcut Moxie's Grill & Bar The Keg Steakhouse + Bar Michael's Pizza Restaurant & Delivery Earls Brewsters Blink Restaurant & Bar Boston Pizza Palomino Smokehouse Koi Sushi Hiro Japanese Restaurant Sushi Bar Zipang SALTLIK Calgary River Cafe Rouge Subway Pho Hoai Vietnamese Noodle House SubwayARRAY COUNT IS: 40========================Prime Time Restaurant Diner Deluxe Boston Pizza Cibo 17th Ave Royal India Kensington Pub Teatro Ristorante Papa John's Pizza Mercato Buchanan's Pulcinella Chinese Village Restaurant Subway La Brezza Ristorante Ruth's Chris Steak House Medina Cafe & Grill Ceili's Subway Earls Kitchen + Bar Vin Room MissionARRAY COUNT IS: 60========================Subway The Unicorn Subway Rosso Coffee Roasters Escoba Bistro & Wine Bar The Living Room Restaurant Ceili's Modern Irish Pub CRAFT Beer Market - Calgary Downtown Embarcadero Wine & Oyster Bar Ranchmen's Club Bonterra Trattoria Silver Dragon Restaurant KOTO Sushi Domino's Pizza The Pink Pearl Restaurant Galaxie Diner Chicago Deep Dish Pizza Subway Brasserie Kensington Redheads Japan Cafe A while ago I found this great algorithm for finding centroids from a set of vectors.

We will use the general definition of the vector for the second article, but for this one we will use the Cartesian product (x, y) for the vector or (lat, long). Here’s the function declaration. func kMeans(numCenters: Int, convergeDistance: Double, points: [Vector]) -> [Vector] For this function you have to tell it how many centres you think there are. After a short internet research, I found a square root of the total data points can be good starting point for the number of centres. But if your data is already categorized like for us. Then we just need a single centroid. So numCenters is 1, for this case, convergenceDistance I tried 0.1 and 0.01, the latter giving a better centroid in practice, and all the data points into points(your places’ locations).

Before we move on if you just use the square root of the total number of data points. You get something like this.

Square root of 60 is 7.74… floored is 7, Int. Therefore 7 centroids and 7 colours. I found using yellow map marker somehow makes this really ugly poop colour. If anyone knows how to get a real yellow map marker, please let me know. Thanks.

The algorithm naturally finds clusters and puts a centroid in the middle. The transparent ones are actual places and opaque ones are the centroids. But this is just finding physically close restaurants.

We want to get even more useful information from this.

The location marker on the map. The blue circle is her work. Where the nearby search is biased around. Now there are the friends and gym. She has 3 close friends. So you create an array of 3 vectors. Her gym has a single location, which she tries to go at least 6 times a week. So that’s also pretty important.

First centroid is work, red.

Second is gym, green.

Third is social circle, purple. You will notice that her friends places are transparent and their centroid is the opaque purple point in the middle.

Friends, gym,  work.

And finally food. After being honest with herself she realized she really likes going to bars to meet colleagues and friends. It’s not so much she likes eating our just anywhere. She really prefers going out to bars.

These are the bars with the highest rating that are also close to her work.

Best bars

Putting it all together. We get the final average location. In purple below.

Blue is where her friends are and red is her work. Green is the gym and cyan is where the best bars are. And the answer to all of this is the purple in the middle.

So that’s it. If Margo wanted to optimize this even further, I would say the purple area covers about 5, 6 viable neighbourhoods. She could rank all of them using a priority queue. Which calculates priority on different weights of price, lease commitment and neighbourhood.

Here’s the summary for functions and classes I used:

I hope you took away some useful strategies for solving some problems. Stay tuned for the next article in the series, where I will integrate priority queues into the clustering. This will allow us to cluster places not just on location but also on different categories like: price, rating, location and type etc.

Continue Reading

#Mobile Development

Design an Error Handling System before you structure your App or Software Project (Part I)
Thinking about an Error Handling System in your app can improve the reliability and maintainability of your app exponentially.
May 8, 2019

#Mobile Development

What is the Discovery Phase of App Development?
The road to a successful app development is long and wearisome. It can take anywhere from 4 months to a year. Every step you take will influence the end-result of your product — so be careful which path you take.
May 8, 2019


8 User Experience Mistakes to Avoid For Product Design
There are 2 things in this world where if you mess up in certain social situations, the memory will haunt you for the rest of your life.
May 7, 2019