Friday, May 30, 2014

LaGuardia Airport Taxi Origin and Destinations, 2013

I've always been curious what the taxi travel patterns for LaGuardia Airport look like since it is the only practical method of travel to and from for many travelers.   Using the data from my previous post on processing all taxi trips for 2013, I did some analysis of just LGA taxi trips.

I think this side by side map viewer shows the patterns best.  Click on the map to open.

Of course there is significant activity near locations you would expect such as Penn Station and Grand Central Terminal.  The next level of features is dominated by large hotels.  If you zoom to lower Manhattan I added a number of hotel locations to highlight this pattern.

I would have liked to have also done a JFK comparison but most trips are to and from Manhattan and it is flat rate. This leads to the meters being started and stopped during the trip and not when the trip starts and finishes.  There are thousands of locations along the Van Wyck.

Passenger data for JFK and LGA, 2013

Taxi Trip totals for Airports, 2013

Time of Day, Day of Week Heatmap for LGA Drop-offs, 2013

Avg. number of drop-offs per hr

credit -

Thursday, May 8, 2014

NYPD Motor Vehicle Collisions

After a year of working with NYPD Collision data that was aggregated to the intersection by month, it was exciting to see this data release by individual collision with a date and time.

Congratulations and kudos to all that worked to make this happen!

This is also personally very satisfying.  I became involved in helping geocode the crash data that was at the time in pdf format as a direct result of this article.

The highlight was -

Council Member Jessica Lappin got into an animated discussion with Petito over traffic crash data. When Lappin asked why NYPD is releasing data in PDF form — and only after the council adopted legislation forcing the department to do so — Petito replied that the department is “concerned with the integrity of the data itself.” Petito said NYPD believes data released on a spreadsheet could be manipulated by people who want “to make a point of some sort.” An incredulous Lappin assured Petito that the public only wants to analyze the data to improve safety, not use it for “evil.”

I think everyone involved showed that while not every representation of the data is going to be perfect, providing this data in a publicly available usable format is valuable for everyone.

That said, some caution must be used in working with this new data.  Over 13% of the records do not contain coordinates, borough, or ZIPCode.

The large majority of the intersections missing coordinates are easily geocoded.  The caveat being that you need to make sure it isn't an intersection name that exists in more than one borough.

Here are the top intersections by count in the NYPD Motor Vehicle Collisions data that do not have coordinates.

From my previous work getting this data geocoded, I have a list of NYC intersections and their coordinates.  Would be great if the data could be augmented with this and made available -  

Street,street2,Zipcode,boro code,PolicePrecinct,Lon,Lat
1 AVENUE,39 STREET,11232,3,72,-74.01273692,40.65662545
1 AVENUE,40 STREET,11232,3,72,-74.01327021,40.65605997
1 AVENUE,44 STREET,11232,3,72,-74.01563035,40.6537815
1 AVENUE,47 STREET,11232,3,72,-74.01738143,40.65210968
1 AVENUE,48 STREET,11232,3,72,-74.01796511,40.65154416
1 AVENUE,53 STREET,11220,3,72,-74.02087613,40.64874399
1 AVENUE,53 STREET,11232,3,72,-74.02087613,40.64874399
1 AVENUE,54 STREET,11220,3,72,-74.02145974,40.6481812
1 AVENUE,56 STREET,11220,3,72,-74.02262334,40.64706385
1 AVENUE,ALLEN STREET,10003,1,9,-73.98863937,40.72293374

Otherwise the data looks pretty good and consistent for the dates available. I'm very curious to see if it actually updates every day.  So far it didn't today. Still, awesome to see this data released this way.

The spikes seem to correlate with snow storms.  The 'down' spikes seem to be holidays.

Lastly, the older NYPD data documents the vehicles involved.  This allows for some interesting insights on some specific crashes.  I was very surprised at how many fire trucks and ambulances are involved in collisions.

Tuesday, April 29, 2014

NYC Taxi Data - 2013

I should have never asked for this data. I have plenty of projects to work on. Still, I couldn't resist the opportunity to have a look at an entire years worth of NYC taxi data.  It became this challenge to see if I could actually manage to work with 50GB of data and over 170 million records.  So after spending an inordinate amount of time over a few weekends on this, here are my results.

First, I have to thank/blame Chris Whong for this adventure.  Please check out his blog.

All my work can be found on github -

Initially, I tried to normalize and compress the data down enough to make it searchable in some form.  For about 4hrs I had a 5GB zipped file of the on my dropbox.  I tweeted that the data was available to download and soon I got an email saying my dropbox account was suspended.  The lesson learned is that the free dropbox only allows for 20GB of downloads per day.

I next decided to try and create a table of total counts by neighborhoods for total origin/pickup and destination/drop-off.  Long story short, the output ended up looking like this:

Get your own copy here.
The corresponding shapefile.
A map version looks like this:


I've also started experimenting with making some maps with the data.  Here are the first two:

A sample of the processed data for one day, April 30th, 2013, can be found here.  If you are interested in more of the data or a specific slice of it, message me @mrsp105 and I will try to assist.

Sunday, February 23, 2014

NYC Bike Lane Violation Parking Citations

Here's a quick post for a map of bike lane violation parking citations using the ArcGIS Online Storytelling Text and Legend web application template.

Date ranges for the citations - 7/30/2013 - 10/29/2013.

For the smaller scales, the citations are represented as heat/density maps.  My preference was to not use a heat map but it was the only way to represent the citations and the bike lanes at the same time.

Zoom in to see the actual citation locations as blue circles.

View Larger Map

Monday, January 20, 2014

Mapping NYC Parking Tickets

Over the last 2-3 months I have been working with  the NYC Parking Violations Issued data released on NYC Open Data.  Basically, I wanted to geocode the records, enabling spatial and visualizations and analysis.  I'm still thinking and working on ways to map this data but in the mean time, I thought I would share the data.  

There were a number of challenges working with this data.

The first challenge was to see if the data was really valid.  Creating a histogram of the total reveals that most dates only have what appears to be a small sample of records or the records are incorrectly encoded with the date(there are a number of records in the future).  I was able to settle on a date range (07/29/2013 - 10/28/2013) that seemed to be consistent and reasonable in terms of having complete data.

Second, I needed to geocode the records before I could map them.  For this I decided to make use of the recently released NYC GeoClient API.  Here is the code I used -  It ran @ 1,500 records per min on an Amazon EC2 server.  The code is quite sloppy.  I just kept adding more code as I found ways to geocode more addresses.   Some, intersections others street addresses.  I also had to determine borough codes from the precincts.

Here is a simple visualization of all tickets in a heat map.

Full map

Monday, November 25, 2013

October update to bike collision data

A quick update to the bike collision charts to reflect the October data.  Big thanks again to John Krauss and OpenScrape for processing the reports into an easy format to work with.

Monday, November 11, 2013

Has the number of bike collisions increased in NYC since the CitiBike launch?

There was a fair amount of concern when CitiBike was launched (May 27th, 2013), that there would be significant public safety issues with thousands of extra bikes riding around in the most congested areas of Manhattan and Brooklyn.   Now that the program has been in place for a few months, I thought it would be worth looking at the NYPD's collision data to see what the impacts are.

Over the last year or so, I've been working with the NYPD Motor Collision Data Reports.  I thought that they would be very useful for mapping, provided they could be formatted properly.   The problem is that the reports are released as PDF documents, and therefore ineditable. The data is aggregated by month to the closest intersection. This limits what can be done with the data because one does not know where the actual incident occurred.

For this analysis I looked at the "Bicycle" count in the Vehicle Type column. This way, I was looking at total bicycles involved in collisions (as reported) instead of just cyclist injuries.

The NYPD Crash Band-Aid project has been working hard on developing Python scripts that extract the PDF documents into a usable and open format - .csv.   I was able to help contribute to the project by providing the initial seeding of Latitude and Longitude coordinates for most of the intersections.  The table of intersections was created by using the Department of Planning LION street file.  Additional intersections geocoded, were made by using the  Dept. of Planning GeoSupport desktop geocoder. Currently, just over 99% of the intersections have coordinates.

Now that the data was in a 'mappable' format, I went to work making a number of maps of bike collisions, as well as other vehicle types.  In the absence of actual traffic counts by vehicle types, this helps create a picture of traffic patterns by various vehicles.  This assumes that there is a strong correlation between collisions and miles driven in those areas, which may not always be the case.

Livery cab collision density map.  August 2011 - September 2013

View Larger Map

Comparison of bike collision density with total vehicle collision density.

I was curious enough to go see where the areas with the highest density of collisions was, so I hopped on a CitiBike. Three hours and 5 different CitiBikes later, I had collected photo documentation of those locations.

And finally, I created a 3D heat map of the bike collision density. Warning, you will need firefox, chrome, or safari browser.  The file will have to first download and unpack locally. Look for a future blog post on how this was created.

Ok ok... For the CitiBike analysis.

One of the first things I tried was to compare density of bike collisions for June - Sept. 2013 to the same months in 2012.  I wanted to see if there were any visible patterns when comparing the months in 2013 when CitiBike was available vs the previous year.   It really didn't show much in terms of patterns.  The data is also sparse for this type of use and the validity of any conclusion drawn from it would be questionable. Still, I spend a lot of time creating these maps, so I've left them in here.

Lastly, I created a polygon area that outlines the CitiBike docking stations (shown in blue below). I used this boundary to compare total bike collisions inside of this area to the total outside.

View Larger Map

Below are two charts that graph the NYC bike collision data by month. 
The first chart shows two bands of data. The top dark blue columns are the total number of bicycles involved in collisions outside of the CitiBike Area, and the lower light blue columns are the number of bicycles involved in collision within the CitiBike Area. I added the orange colors on the bottom columns so that comparisons for the summer months of 2013 can be easily compared to 2012.  

This second chart shows the percentage of bike collisions that occurred in just the CitiBike Area.

There has been an increase in the total bike collisions for the CitiBike area since the bike share program started at the end of May.  However the increase is small and total bike collisions has increased also.  More importantly, the percent of bike collisions in those four months (June - Sept 2013) has stayed consistent with the historical percentages.  

Based on the collision data published by the NYPD, and more importantly on the last 4 months of data since the launch of the bike share program by CitiBike, my analysis shows that there is no significant increase to bike collisions in the areas with CitiBike docking stations compared to all bike collisions in the city.

Link to spreadsheet data by month. 

Please stay tuned as I update results new data becomes available   I will also be posting more details on how some of the maps in this blog were created.