Comparing the neighborhoods of Hyderabad City, India

This is a learning project that I would like to share here. This project deals with a business problem and gives a solution that helps in taking important decisions for the business. This post follows a sequence of steps so that it will be easy to read and understand the entire project.

The Introduction:

As per the title, this project deals with Hyderabad city, India. Before going straight into the problem, it is a good idea to provide a formal introduction about the city. So let’s get to know about Hyderabad city.

The Business Problem:

The business problem we’re dealing with here is that we have a business individual who wants to open a restaurant in Hyderabad. But he was not sure where to start with. Now our main goal is to solve this problem for him. So that he can decide on where to open his restaurant in the city.

Data sources and data cleaning:

For this project, Hyderabad city neighborhood names, their respective latitude and longitude coordinates, and the data of nearby venues for each neighborhood are required. The following are the data sources for our project.

Web-Scraping the Wikipedia page
Using Beautiful object on the data
scrapping the required data from the page
storing the data into a data frame
response from the nominatim API
Latitude and Longitude coordinates

Data Cleaning: The challenge with coordinates data:

By using the nominatim API, coordinates for 14 neighborhoods are not obtained. This is mainly because,

Neighborhoods data frame with coordinates

Geopy library to get coordinates of Hyderabad city:

Geopy is a python library that can be used to fetch coordinates of an address. This library was used to get the coordinates of Hyderabad city itself. These coordinates will help plot the map of Hyderabad city using python’s folium visualization library. Below is an example.

using geopy library

Neighborhood location data using Foursquare API:

To get the nearby venues of a neighborhood, Foursquare API was used. Foursquare API provides location data of an address. It provides diverse information about venues, users, photos, check-in’s, geo-tagging…etc

response from Foursquare API
venues data using Foursquare API

Methodology:

We have the required data for our project. Let’s see the statistical analysis of data. There are a total of 244 records in our final data frame named hyd_data. Therefore, we have 244 neighborhoods and their coordinates in this data frame.

venues data frame
Top 10 venues in each neighborhood

Machine Learning Algorithm:

Now that we have the sorted data frame. We will use the unsupervised machine learning algorithm “K Means Clustering” to cluster the neighborhoods. This clustering algorithm forms clusters of neighborhoods with similar most common venues. This helps us get an idea of the most common venue in a cluster.

Elbow Plot
Hyderabad Map with neighborhoods

Results:

Finally, we have the following required things to provide a solution to our business problem.

The Final Data Frame

Plotting the Results using folium:

By visually plotting the results, we get a better idea of the clusters and the neighborhoods. For plotting, we use the folium library. This library is very useful for plotting the location data. The folium library is used for creating beautiful map visualizations. It also has zoom functionality, that enables one to zoom in on the map and explore the areas within the map. The clusters in the plot are denoted with circular color markers. The three clusters are marked in red, blue, and green circles. The final plot result is shown below.

Cluster Map
cluster 1 data frame
cluster 2 data frame
cluster 3 data frame

Discussion on results:

From the pictures of all three cluster data frames, we can make some analysis and discuss them. In the first cluster data frame, we can say that restaurants are not the top 1st most common venues. But they are in the 3rd and 9th most common venues. From this observation, we can say that either the competition or demand for restaurants is less in that particular cluster. If we further dive into this and find the answer to “why the restaurants are not the top most common venues in the first cluster?” We may get a clear idea of whether or not to open a restaurant in this cluster.

Conclusion:

In this project, our business problem is choosing a neighborhood to open a restaurant in Hyderabad city, India. For this problem, we first need the neighborhoods of Hyderabad and their location data. The required data was acquired from the Wikipedia site and by using nominatim API. The venue data of each neighborhood was obtained using the Foursquare API. The K Means clustering algorithm was used on the data and grouped the data into 3 clusters. Each cluster was created by grouping the neighborhoods on the mean of the frequency of occurrence of each venue category. The final result was labeled and segmented into 3 clusters. The data were plotted using the folium library. Analysis was made on the final data sets and observations were made. The observations and the comments made on the data should help the stockholders to decide on where to open a restaurant.

I love science and technology. ML, DL enthusiast