Over the last 2-3 months I have been working with the NYC Parking Violations Issued data released on NYC Open Data. Basically, I wanted to geocode the records, enabling spatial and visualizations and analysis. I'm still thinking and working on ways to map this data but in the mean time, I thought I would share the data.
There were a number of challenges working with this data.
The first challenge was to see if the data was really valid. Creating a histogram of the total reveals that most dates only have what appears to be a small sample of records or the records are incorrectly encoded with the date(there are a number of records in the future). I was able to settle on a date range (07/29/2013 - 10/28/2013) that seemed to be consistent and reasonable in terms of having complete data.
Second, I needed to geocode the records before I could map them. For this I decided to make use of the recently released NYC GeoClient API. Here is the code I used - https://github.com/tswanson/NYCParkingGeocode. It ran @ 1,500 records per min on an Amazon EC2 server. The code is quite sloppy. I just kept adding more code as I found ways to geocode more addresses. Some, intersections others street addresses. I also had to determine borough codes from the precincts.
Here is a simple visualization of all tickets in a heat map.