List of Data sources

  • DSNY Litter Basket Inventory
  • 311 Service Requests from 2010 to Present
  • DSNY Sections GeoJSON

    Explanation of Data

    NYC Population by Borough: a simple dataset that lists all five boroughs, Manhattan, Queens, Brooklyn, Bronx, and Staten Island, and the population of each borough. Data is based on New York City Population By Neighborhood Tabulation Areas.

    DSNY Litter Basket Inventory: Locations of litter baskets tracked by DSNY. Main parts I used are the points, the coordinates of the basket, and the DSNY sections.

    311 Service Requests from 2010 to Present: All 311 Service Requests from 2010 to present. 311 is a service that residents of New York CIty can call to lodge a complaint they have. There are a lot of different complaints, but I focused on ones about litter. Before even downloading the dataset, I filtered to only include the rows that had 'litter' as the complaint type.

    DSNY Sections GeoJSON: A GeoJSON file that broke the DSNY sections into polygrams that are easily mappable onto a folium map. Each district has a section property that is the key to map the data onto it.

    List of libraries

    Pandas: Imported csv files into dataframes to easily store and manipulate datasets
    Numpy: Number manipulation
    Seaborn: visualize data into charts
    matplotlib.pyplot: visualize data into charts
    Folium: creating interactive, embedded maps
    Folium MarkerClusters: part of folium library that groups close markers together to reduce lag

    Data Manipulation and Cleanup

    Before even downloading the data, I had to filter the 311 complaints because the sheer amount of data was too much. I only downloaded the rows that had 'litter' in them and discarded the rest. That still left me with over 39,000 complaints.

    To clean up the datasets for the graphs, I had to create columns to join the datasets together. In the litter basket inventory dataset, it did not list the borough it came from, but it did list the DSNY section. That section code included an abbreviated version of the borough it is located, so I made a function that took the section and returned the borough. I did this to make it easier to join the different datasets together.

    I also had to clean the complaint types for the 311 complaints. For a litter basket request, the city had two different options listed: 'Litter Basket / Request' and 'Litter Basket Request'. I made a function that turned 'Litter Basket / Request' into 'Litter Basket Request'.

    The complaints table already had the borough listed, but I had to drop the rows that had 'Litter Basket Complaint' as its complaint type because it did not relate to my project too much. I did include it in the different types of complaints charts because I felt it was important to show all the types of litter complaints.

    All of the graphs are bar charts that showed the counts of different values that appear in the dataset and the amount of total rows group by boroughs. After joining the datasets, I renamed columns to make it easier to understand for the reader and outputted onto a bar graph. I used the seaborn library to make these graphs. For the bar graphs, some of the borough's names were too long, so I tilted it at a 30 degree angle instead of making it vertical.

    For the interactive maps, I used folium and some of its libraries. I saw that having all the points at once would be too much for one map and it would slow down the website. Folium has the ability to have layers fpr different pieces of information, and I took advantage of that to separate the bin location and the complaints location. Having all the points be displayed at once would also slow down the website, so I utilized a folium property called MarkerCluster that groups points that are close together into a cluster.

    Mayor De Blasio starts City Cleanup Corps