The file I received, Retail Food 1006 Licenses 2013.xlsx, contains the address for the facility, but not the latitude and longitude. I will need latitude and longitude for mapping purposes, so I needed to find a way to geocode them. Google offers a free service, but limits the number of requests to 2500 per day. The data set has 15176 facilities, so that would take a week to do. I decided to download the food inspection data from the data portal. This has the latitude and longitude that I need, along with the restaurant license number. I also have license number in the data set from above. Now clearly, not all of the restaurants listed in the first data set will have had an inspection, but many will. I used the Google API to geocode the remaining restaurants. With this process, the geocoding went from a week to two days.
The last step with this file was to take the newly generated latitudes and longitudes get the ward number for each restaurant. To do this, I used an API provided by the Chicago Works For You project.
Before I get too far, let me list the assumptions that I've made up to this point.
- The license number appearing in Retail Food 1006 Licenses 2013.xlsx corresponds to the same restaurant and license number in the food inspections data set on the data portal.
- The latitude and longitude that I receive from the Google API and the latitude and longitude from the city of Chicago's internal data are both interchangeable. That is, either source is valid for finding the location of a given restaurant.
- The restaurants that cannot be geocoded using either method or have a ward of zero after using the CWFY API are randomly distributed among all restaurants, and leaving them out of the analysis introduces no bias and is generally not harmful or wrong to do.
- Get Retail Food 1006 Licenses 2013.xlsx file from Chicago Department of Public Health
- Match file with food inspections data to get some latitude longitude pairs
- Use Google API to get remaining latitude longitude pairs
- Use CWFY API to get ward number from latitude longitude pairs
- End with geocoded_risk_data_w_wards.csv with some intermediate files created along the way
Now that the data is ready, let's take a look at it. First, I want to plot the number of restaurants per ward on a map. Here's the result...
It looks like there's definitely a concentration of restaurants around the loop. Ward 42 in particular has nearly 1600 restaurants in it, where the average across all wards is about 300. This makes the difference between the other wards a little difficult to see, since the color scale is dominated by ward 42's extreme number of restaurants. To get around this, let's ignore ward 42 and redraw the map...
Now it's more clear that there's an abundance of restaurants in the wards around the loop, specifically to the north and west.
Another way to look at this would be to run a 2d density on the latitude longitude pairs. Here's that graph...
The preceding graphs were restaurant counts across all risk levels. Let's split out the risk levels and see if there's a pattern. Specifically, let's look at the proportion of a given risk level to all of the restaurants for each ward. For example. If a ward has 24 High risk restaurants out of 154 total, let's look at that proportion... 15.6%.
And low risk level...