Tuesday, April 29, 2014

NYC Taxi Data - 2013

I should have never asked for this data. I have plenty of projects to work on. Still, I couldn't resist the opportunity to have a look at an entire years worth of NYC taxi data.  It became this challenge to see if I could actually manage to work with 50GB of data and over 170 million records.  So after spending an inordinate amount of time over a few weekends on this, here are my results.

First, I have to thank/blame Chris Whong for this adventure.  Please check out his blog. http://chriswhong.com/open-data/foil_nyc_taxi/

All my work can be found on github - https://github.com/tswanson/TaxiNYC2013

Initially, I tried to normalize and compress the data down enough to make it searchable in some form.  For about 4hrs I had a 5GB zipped file of the on my dropbox.  I tweeted that the data was available to download and soon I got an email saying my dropbox account was suspended.  The lesson learned is that the free dropbox only allows for 20GB of downloads per day.

I next decided to try and create a table of total counts by neighborhoods for total origin/pickup and destination/drop-off.  Long story short, the output ended up looking like this:



Get your own copy here.
The corresponding shapefile.
A map version looks like this:


h


I've also started experimenting with making some maps with the data.  Here are the first two:





A sample of the processed data for one day, April 30th, 2013, can be found here.  If you are interested in more of the data or a specific slice of it, message me @mrsp105 and I will try to assist.