Data Analysis
I've also posted this to my Github: https://github.com/panchitalopez/rebuildchinatown
I firstly started with downsizing my CSV into a data frame to only include the fields that I needed. This also helps to make it easier to read: This code produces the following CSV file.
Code:
df = pd.read_csv(inputfile)
new = df[['Name','Cuisine','Cuisine_General', 'Type', 'Borough', 'ZIP', 'Lat','Long']]
​
My next step was to analyze the relationship between how many restaurants there were per cuisine type I first calculated by continental cuisine and then broke it down into specifics.
Code:
​
#Analysis 1: to find count and percentage of cuisine (General).
x = df.Cuisine_General.value_counts(dropna=False)
y = df.Cuisine_General.value_counts(dropna=False, normalize=True)
z = pd.concat([x,y], axis=1, keys= ['counts', '%'])
print (z)
#I selected the rows I wanted to count and used the .value_count function which counts the # of unique values in a list.
#Analysis 2: find count and percentage of cuisine(specific)
x = df.Cuisine.value_counts(dropna=False)
y = df.Cuisine.value_counts(dropna=False, normalize=True)
z = pd.concat([x,y], axis=1, keys= ['counts', '%'])
print (z)
#Here, I do the same thing, but because I want specifics, I use the column that is labelled "Cuisine"
Code for Pie Chart:
df.Cuisine_General().plot(kind='pie',colors = 'red','blue','green','orange','yellow', (figsize= 10,15), autopct='%1.1f%%)
Code for Horizontal Bar Graph:
(continuing from #Analysis2):
z = pd.concat([x,y], axis=1, keys= ['Count'])
z = z.plot(kind='barh', color = 'lightseagreen')
plt.title('Closure of Restaurants During Pandemic by Cuisine')
plt.xlabel('Count')
plt.ylabel('Cuisine Type')
plt.show()
Breakdown:
​
I found that out of the closures analyzed, 38% were Asian cuisine, 36% were American, 18% European, and 8% Latin. Even more specifications can be found on the right, in which almost a fifth of the businesses were Chinese, the second highest affected out of the study.
Further Analysis:
I also wanted to see how closures consisted of Chinese restaurants out of all the restaurants in the data. To do this, I made two data frames: one for just Chinese restaurants and one for all the restaurants on the original CSV file.
Code:
#Returns a dataframe of JUST Chinese restaurants
x = (df.loc[(df.Cuisine == 'Chinese') & (df.Type == "restaurant")])
x = (df.loc[(df.Cuisine == 'Chinese') & (df.Type == "restaurant")]).count() #counts how many columns there are
#Returns a dataframe of JUST restaurants
y = df.loc[df.Type == "restaurant"]
j = x/y*100
print (j) #j = 22.297297% of restaurants closed were Chinese restaurants.
The loc function allows you to filter a data frame by specifying conditions (in this case, the Cuisine columns that contained "Chinese,' Type that contained "restaurant", type that contained "restaurant" again without the specification for "Chinese" in the second data frame. This resulted in the following CSV files:
To count the number of restaurants in the dataframes, I used the .count() functions. There were 33 Chinese restaurants and 148 restaurants overall. I created a variable, j, to divide the number of Chinese restaurants by overall restaurants and multiply by 100 to return as a percentage.
Conclusion:
​
22% of restaurants that closed down during the pandemic were Chinese restaurants. This shows my hypothesis was correct.
Location Based Analysis
Lastly, I was curious to see how many businesses had closed per borough and which neighborhoods had the highest number of closures. Most of the data had restaurants that were located in Manhattan, Brooklyn, and Queens. I've included the top neighborhoods with the most closures.
Just a general tip: there are many restaurants in Manhattan but the Brooklyn & Queens restaurant are where it's at for good food. Also, who invited Staten Island into the mix? We all know that SI isn't part of NYC. Kidding.
Code:
#Closures Frequency by ZIP Code
x = df.groupby(['ZIP']).size()
x = x.sort_values(ascending=False)
print(x)
​
#Closures Frequency by Borough
x = df.groupby(['Borough']).size()
print(x)
​
​
I used the group by function to group by common elements and then sorted the data(ZIP) based on highest # of closures.
Analysis by Borough and Neighborhood:
Manhattan had a total shutdown of 147 businesses, with the most losses occurring in the following neighborhoods:
-
Chinatown: 10013 & 10002
-
Meatpacking District: 10014
-
East Village: 10003
-
NoHo/Soho: 10012
-
Chelsea: 10011
​​
Chinatown had 43 closures. That's almost 20% of the businesses that were analyzed.
Brooklyn had a total shutdown of 56 businesses with the most losses occurring in the following neighborhood:
-
Fort Greene: 11238, 11201, 11205, 11217
​
Queens had a total shutdown of 18 businesses, with the most losses occurring in the following neighborhood:
-
Forest Hills: 11375
Zoomed out Perspective, Brooklyn and Lower Manhattan are visible.
There are the most closures in Chinatown and the Lower East Side.
Closures in Queens.
Zoomed out Perspective, Brooklyn and Lower Manhattan are visible.
I also created a simple graphic to help visualize the data with the folium library and Open Street Maps.
Code:
​
df = pd.read_csv('ult.csv') #reads the CSV file into a data frame
df = df[['Name','Lat','Long']] #selecting columns to keep for this visual
map = folium.Map(all_=[df.Lat.mean(), df.Long.mean()], zoom_start=10) #find the mean of the longitude/latitude and put that as my starting location
for i,j in df.iterrows(): #iterates through the rows in the data frame
folium.Marker([j['Lat'],j['Long']],popup=j['Name'], #puts a marker (in this case the typical blue folium icon) at the coordinates, as well as a popup when you hover over the coordinates
icon=folium.Icon(icon='blue')
).add_to(map)
map
Problems Encountered and Conclusions
I ran into many problems in this project. I underestimated how easy to find a database that had the information I required for my research. Businesses, especially long-standing ones, would temporarily close and re-open after a certain duration of months they needed in order to get back on their feet. I removed some businesses from the original CSV and added new ones to it as well. A lot of time was spent updating the CSV file to reflect how the conditions of the hospitality industry are now. There have been an estimated 1,000 restaurants alone that closed during the pandemic in NYC. I had only around a fifth of them in my dataset. In the future, I'd allocate more time to research and revising my data.
​
This data, regardless, does prove my hypothesis that Chinatown was significantly affected. Even though the city is on the mend from the pandemic, Chinatown is one of the places that is recovering very slowly. I hope this can help you to see the impact that the last year and a half had on one of the nieghborhoods in New York City. Maybe next time, stop by for a Vietnamese coffee or a bowl of noodles in Chinatown. You'll be helping the community and your stomach will thank you! Thanks for reading! Stay safe and healthy.