Better Programming

Advice for programmers.

Follow publication

Visualizing Street Tree Population Variance in NYC Using GeoPandas, Plotly, and JavaScript

Jerry Clayton
Better Programming
Published in
17 min readJan 27, 2023

Photo by Maria Teneva on Unsplash

In this article, I will:

  • Use data from the US Census Bureau and the NYC Parks Department to map the change in street tree cover in NYC from 1995 to 2015.
  • Walk you through cleaning, aggregating, analyzing, and visualizing the data using Pandas/GeoPandas and Plotly and then presenting the data interactively using a web app.
  • Discuss how tree and population density varied spatially from 1995–2020 and the social implications of these changes.
  • Find that the median number of trees per block increased substantially citywide over this period, but that the magnitude of change varied greatly from borough to borough and neighborhood to neighborhood, with some neighborhoods losing trees.
  • Find that targeted tree planting efforts from 1995 onwards ameliorated urban heat island effects in under-forested, historically redlined areas, but also contributed to gentrification.

Finally, I’ll compare the performance of the server and client-side rendering (using Plotly Dash and JavaScript) for this web app and find that for simple interactive data visualizations, client-side rendering performs best.

Motivation

Last fall, a friend sent me an article in the New York Times about the disparity in tree cover in different neighborhoods in New York City, where I have lived since 2016.

This article was particularly salient for me at the time, as I had just spent the summer in Seattle, during which the entire pacific coast experienced a record-breaking heat wave so extreme that it was deemed a ‘1-in-1000 year event’.

It introduced me to the Urban Heat Island (UHI) effect, a phenomenon where “temperatures are higher in urban [areas] compared with surrounding rural environments”. Furthermore, it outlines how pockets of intense, deadly summer heat are spatially reproduced according to historical patterns of discrimination, like redlining, in NYC and beyond.

The key physical variable in the intensity of hyper-local urban heat islands is the presence of trees. I have always been interested in environmental justice, and given the tree-planting efforts and an uptick in housing development under the last two mayoral administrations, I wanted to understand how NYC’s UHI have changed in recent times and gain some insight into the changing geography of climate resilience throughout the five boroughs. Specifically, I wanted to understand:

  • How tree and human populations have trended generally
  • Whether or not changes in these populations over time are associated with gentrification
  • Whether or not the areas with the fewest trees relative to the city in 1995 were still relatively underforested today
  • Whether or not historically redlined neighborhoods had fewer trees relative to the city in 1995 and today

First, I needed some data.

Data Sourcing and Cleaning

I searched NYC Open Data and found the Street Tree Census project of the parks department, which has taken a manual inventory of all street trees in 1995, 2005, and 2015.

I downloaded all of the data and began exploring. After reading the data dictionaries, it became apparent that comparing street trees at the census-tract level made the most sense for the data because census tracts were provided for 97% of trees and would be easily merged with census data. In order to understand how UHI have shifted spatially and determine their change in magnitude and number, I decided to compare the spatial distribution of trees, the relative abundance of trees, and the total growth of the street tree population over time.

Further, I wanted to identify where urban heat islands have been in the past and where they are now. In order to track heat islands and measure their changing severity across time, it was necessary to understand how the population of residents had shifted in distribution alongside the tree population. Unfortunately, the street tree census occurred in the exact middle of the decennial census cycle, so I had to decide whether to use population data from the census before or after each tree census.

I chose to use population data from the census after each tree count, as it allowed me to use the data from the recent 2020 census, and given that the number of trees in the city has been trending upward, it meant that I would be under-estimating the number of trees as opposed to under-estimating the number of people. This seemed more appropriate, as I wanted to highlight the human element of climate change. I downloaded 2000, 2010, and 2020 census information from IPUMS NHGIS.

The first step was to aggregate trees by census tract. This turned out to be more complicated than it seemed because although census tract codes were included for most trees, these codes are only unique at the county level. Since each borough of NYC is its own county, this meant that codes were not unique to the city.

The folks at the parks department were nice enough to include a solution: a column called boroct or “borough census tract”, which converted variable-length census tract codes to six-digit numbers and prefixed them with a 1, 2, 3, 4, or 5 indicating which borough the tree was in. This resulted in a 7-digit geographic identifier that was unique at the city level.

While boroct was provided for nearly every tree in the 2015 census, the 2005 and 1995 censuses had large numbers of trees with a census tract code and no boroct entry — almost 20% of all trees in 2005. I wrote a short function to encode this number:

#function that returns a boro_ct given the borocode, 
#length of the census tract code, and the census tract code as a string

def encode_boroct(bc, l, sct):
bct = np.where(l == 1, bc+"000"+sct+"00",
np.where(l == 2, bc+"00"+sct+"00",
np.where(l == 3, bc+"0"+sct+"00",
np.where(l == 4, bc+sct+"00",
np.where(l == 5, bc+"0"+sct,
np.where(l == 6, bc+sct,
"NaN"))))))
return bct

Some trees had a borough listed but not a census tract; those were dropped, and I saved the number of trees in each census tract for each census year to three DataFrames. In all, 97% of trees across 1995, 2005, and 2015 data had a census tract designated, and all of them were included in the final DataFrame.

The 2010 Census Tract designations were used across all three census years, so I imported the shapefile of these tracts as a GeoPandas GeoDataFrame, subsetted it for NYC, and joined it to the 2015 tree count data using boroct.

I then used successive left outer joins on boroct to make a final DataFrame which contained the 2015, 2005, and 1995 tree counts, as well as tract names, land area, and geometries for each census tract.

Next, I cleaned the population census data. Since I was doing a fairly naive analysis, this was simple: I kept only the population, aggregated by census tract, and once again used successive left joins to produce a single DataFrame with columns for each census year.

Finally, I imported the shapefile of the 2020 census tracts, as I wanted to make full use of the updated census data. I aggregated by census tract and checked to make sure there was minimal difference in 2020 and 2010 tract designations. Satisfied with the minimal data lost — just under 5% of census tracts containing less than 1% of total trees — I joined the complete tree GeoDataFrame to the complete population GeoDataFrame, to produce a final GeoDataFrame for graphing.

Columns in this final GeoDataFrame included tree counts for 1995, 2005, and 2015, as well as population statistics for 2000, 2010, and 2020. These were aggregated by census tract, and the final columns stored the land area and neighborhood names of each tract.

Final GeoDataFrame structure

Finally, I needed a few indices. In order to meaningfully compare trees per land area across time, it was necessary to come up with a land area unit that was intuitive for city residents.

The census reports land area measurements in square meters, but New Yorkers think in terms of city blocks, so I decided to define a standard unit block and report Trees Per Block.

After some research and help from StreetEasy, I came up with an average block size that worked as a reasonable divisor for my dataset, with most blocks having between 1 and 40 trees. I converted the tract land area to blocks and divided the tree count by the number of blocks in each tree census year.

In order to relate tree populations in an anthropocentric context, I computed a “People Per Tree” index for all three census years. While urban heat islands in industrial but nonresidential locales are still ecologically relevant, the magnitude of their effect on human health is substantially diminished by the decreased population in these spaces.

The focus of my project was to specifically look at densely populated urban areas, and how human health might be impacted by increased tree canopy.

Sample of computed indices

After computing these indices, I was ready to dive into the data.

Exploratory Data Analysis

In total, NYC added over 157k trees and 587k people over the surveyed duration. At the tract level, 1604 of 2022 tracts (79%) added trees, and 1505 of 2022 tracts (74%) added people. This data represents 95% of all 2020 census year tracts and 96% of trees.

Moreover, the tree planting efforts from 1995 to 2015 sufficed to increase the median number of trees per block in each census tract from 15 to 22 citywide and reduced the median number of people per tree from 19 to 15.

While these trends remain true at the borough level, there was a meaningful variation in the number of trees and persons added between boroughs. Statistics for the median census tract are shown in the following table:

Characteristics of the median census tract, 1995–2020

As can be seen in the table, The Bronx experienced the most rapid growth over the period, adding the most trees per block and nearly as many people per tract as Staten Island, whose median census tract is almost 5 times larger than that of The Bronx. Brooklyn, whose 720 census tracts are more than double the number of The Bronx, added the greatest total number of trees and people. Queens had the lowest median increase, which prompted me to dive deeper into the borough-specific stats. Focusing on the minority of tracts that experienced a decline in the tree and/or human populations provided some additional context.

% of census tracts in each borough that lost trees or population, 1995–2020

Amongst the five boroughs, Queens had the highest percentage of its census tracts experiencing a decline in the total number of trees, while the Bronx had the fewest, with 35% and 10% respectively. Interestingly, in all boroughs except the Bronx, most of the tracts that lost trees also experienced a total gain in population.

Manhattan, in which 34% of census tracts lost population, added trees to almost 80% of those tracts. Taken together, a few trends emerge: while Manhattan’s urban forest grew substantially, the coincidental population reshuffling means that although there was a material reduction in UHI risks, this benefit was not equitably distributed across space or demographics.

In Queens, the high (21%) number of census tracts that added population and lost trees could be a result of new housing development, but further study is required to confirm this hypothesis.

Average Trees Per Block Difference, 1995–2015

Looking into the spatial distribution of these changes confirms the inconsistency of tree planting efforts citywide, with Manhattan, Queens, and Brooklyn exhibiting the greatest intra-borough variance. In Manhattan, tracts north of Central Park planted the majority of the trees in the borough, while those south and east of the park largely maintained or suffered net losses of street trees.

In Brooklyn, although trees were planted throughout the borough, most trees were planted in tracts north and east of Prospect Park, while the minority of tracts that lost trees are nearly universally located south of the park.

In Queens, changes in tree population were hyper-localized. Certain neighborhoods, like Jackson Heights and Astoria in the north, and Glendale, Richmond Hill, and Ozone Park in the south, experienced dramatic losses in street tree population, while borough-wide, most tracts saw their street tree numbers stagnate or increase slightly. Staten Island added trees throughout, although at a lower rate than the exploding Bronx, which greatly increased tree populations in the majority of its census tracts, with the notable exception of Riverdale.

A look at the 1995 choropleth reveals that by and large, tree planting efforts were concentrated in areas of the city with a dearth of street trees, most notably in the south Bronx and northern Manhattan. Additionally, the aforementioned neighborhoods in Queens which experienced the most extreme deforestation were also those tracts with the largest relative surplus of street trees in 1995.

Average Trees Per Block, 1995

In aggregate, tree planting and removal from 1995–2015 equalized the abundance of street trees citywide. This is confirmed by comparing the histograms of trees per block from both years:

Distribution of Average Trees Per Block Amongst Census Tracts, 1995
Distribution of Average Trees Per Block Amongst Census Tracts, 2015

From an environmental justice standpoint, tree-planting efforts in the last two decades reduced the visible legacy of redlining citywide. Comparing the University of Richmond’s map of historical redlining to the distribution of street trees in 1995 shows clearly how decisions made decades earlier have persistently influenced the geography of the city:

Redlined Map of NYC from 1931. Areas in Green and Blue received preference for home loans, while areas in Yellow and Red were systematically excluded from lending.
Average Trees Per Block per Census Tract, 1995

While not an absolute proxy for street tree abundance, there was an obvious relationship between an area’s status as redlined or not redlined and the number of trees it contained. By 2015, however, two decades of street tree planting efforts primarily targeted towards under-forested neighborhoods had successfully reduced this correlation, as can be seen below:

Average Trees Per Block per Census Tract, 2015

While newly planted trees have successfully remedied historical wrongs in at least one dimension, the social impact of these initiatives is not exclusively good. The relatively rapid urban greening has undoubtedly accelerated gentrification citywide, especially in Manhattan and Brooklyn.

Multiple studies have found a causal link between increased urban green space and gentrification, and further examination of the data elucidates this linkage in NYC. The Urban Displacement Project’s map of gentrification and displacement in NYC reveals that the tracts which planted the most trees from 1995–2015 are also those which are actively undergoing gentrification and displacing low-income households.

Mapping only those tracts which lost population while adding trees highlights bastions of gentrification like Bushwick, Crown Heights, Williamsburg, Astoria, East Harlem, Washington Heights, and the South Bronx.

Change in People Per Tree Amongst Census Tracts Which Lost Population and Gained Trees, 1995–2020

Further, the citywide two-dimensional histogram of the change in Trees Per Block and Population Per Block shows a fairly normal distribution of census tracts, with most tracts slightly increasing both street tree and population density.

Distribution of Change in Population and Street Tree Density Amongst Census Tracts in NYC, 1995–2020

When this distribution is mapped according to the borough, however, vast differences can be observed between boroughs.

Distribution of Change in Population and Street Tree Density Amongst Census Tracts in NYC by Borough, 1995–2020

As is shown above, Queens and Brooklyn had a much larger number of tracts decreasing their population density in the sampled period than did Staten Island, Manhattan, and the Bronx.

This change was independent of the change in tree density in Queens but largely accompanied by an increase in street trees in Brooklyn.

While decreasing population density is not a one-to-one map of gentrification, research indicates that declining population density is a result of inner-city gentrification, stemming from a reduction in household size. It is likely that, especially in Brooklyn, the increase in street trees further contributed to gentrification in certain neighborhoods.

Making the Graphs

From the beginning of the project, I knew that I wanted the end result to be a set of choropleths, because I felt that this was the best way to clearly communicate the change in trees over time.

A choropleth is a map of a geographical region with subdivisions of the region colored according to some measure. Plotly, which supports multiple types of choropleths, is interactive by default, and whose python interface supports GeoPandas was an easy choice for this project.

Before making my maps, I had the decision to make: whether or not to split the data into quantiles. By default, Plotly interprets numerical values as continuous data. This works for many datasets, but it can mean that outliers dramatically skew the color scale in some cases, minimizing meaningful differences in the data. To make this decision, I looked at the histograms of the Trees Per Block index, which I was planning to use as the primary graphed variable.

Histogram of trees per block in each census tract, 1995

As we can see in the above graph, the vast majority of the census tracts had between 3 and 25 trees per block in 1995. The distribution appears to be log-normal, with some outlying tracts on both ends of the range having as many as 72 and as few as 0.5 trees per block. In order to more meaningfully represent the majority of the data, I decided to split the data into discrete quantiles for graphing.

I imported the DataFrames which I had generated in the cleaning step and started with px.choropleth, which forms trace-based maps (i.e., without a background map). I centered my map on NYC and rendered a first draft:

2015 data version 1, zoomed

As you can see, the majority of NYC’s land area is represented above. Nonetheless, I felt that adding a base layer to the choropleth would improve the ability of viewers to compare and contrast different census tracts in the city and importantly show the location of various parks.

The improved map:

2015 data version 2, zoomed

This looks better, helps the user understand where gaps in the data are located, and provides context for them to judge the importance of these gaps.

The tracts on the map are colored according to the “Trees Per Block” index. But there were still several metrics that I wanted to present alongside this index, so I included them in a mouseover pop-up.

I made three figures, one for each tree census year. I wanted to make one additional figure to display the cumulative change in all statistics since 1995. I computed the difference in trees in each census tract from 2015 to 1995 and the respective difference in population from 2020 to 2000 and used these figures to make ‘Trees Per Block Change’ and ‘People Per Tree Change’ columns.

In the same fashion as before, I examined the distribution of the ‘Trees Per Block Change’, decided to discretize it, and came up with reasonable quantiles — this time there were 9. I fit my color scale to my quantiles and modified my hover template so that it could accommodate more information. The new template:

Census Tract Name

Population Change: 1000 People

Tree Change: -2 Trees

14 People per Tree, 2000

15 People per Tree, 2020

10 Trees per Block, 1995

10 Trees per Block, 2015

Lastly, I made a fifth dataframe with NaNs replaced with “-” to pass to the hover template. This is necessary as replacing NaNs with a more legible character in the same dataframe that is passed to the Plotly graphing engine resulted in issues with the discrete coloring configuration I had established to this point.

With that, the graphs were complete!

Putting It All Online: Dash vs JavaScript Performance

I was ready to put my project online. I decided to spin up a small Dash app for users to browse the choropleths. I spent a bit of time adding a slider and a loading spinner, and after I was satisfied with the results on my local machine, I set up a Heroku Dyno to test the app out. Others have detailed that process here.

Almost immediately, I was disappointed by the server-side response time. My final choropleths were between 14 and 40mb each — relatively small — but my Dyno was taking up to 25 seconds to receive and display each individual graph after the slider was moved. This broke down into a consistent 3–4 seconds for the server response and a highly variable 12–22 seconds for data transfer. This was in addition to a 5–10 second initial load time.

I considered my options: I could pay Heroku more money for better service, but that seemed to me unnecessary and lazy. I could add client-side caching, but this wouldn’t solve the initial slow data transfer. I could, and did, add memoization — a type of caching that saves the results of function calls to the server so that they can be retrieved when the function is called again with the same inputs — but this only improved the server response time, not the throughput issues. I could reduce the size of the choropleths themselves, which would likely have meant substantially reducing the information in the hover boxes — this seemed to me to defeat the point of my project entirely.

In the end, I decided to give up on using Dash entirely. I realized that I was over-engineering: as the user does not have the ability to modify the dataset being visualized, there was no need for a server response at all. Instead, I took advantage of the fact that Plotly.py is merely a python interface for the JavaScript library Plotly.js.

Instead of boxing myself into a python-only implementation, I exported my finished figures to JSON using orjson and duplicated the same bare-bones webpage I had built in Dash using basic HTML, CSS, and JS.

The resulting webpage features a short up-front load time but has no lag when switching views between the various graphs. From the user’s perspective, this is a much more seamless experience.

You can view the Heroku implementation site here and the JavaScript implementation here.

Conclusion

In summary, I found that:

  • both street tree and human populations generally increased citywide from 1995–2020, but that there were large spatial variations in the magnitude of increase both between and amongst boroughs, with some neighborhoods suffering heavy losses in one or both categories.
  • targeted tree planting efforts ameliorated UHI effects in under-forested, historically redlined areas, but that these new trees had the secondary effect of contributing to gentrification.

In terms of performance, I found that processing the data “offline” and storing the results is a better implementation than redoing the processing “online” every time. This means that for data visualizations that feature a static dataset, an offline implementation is always a better choice. This analysis could be further expanded by including additional population variables like race, age, income, and new housing construction to more holistically understand the social implications and patterns of tree planting.

Interpolating population and tree data would allow for a more holistic temporal understanding of these changes, and could especially be used to obtain a more accurate population estimate for each of the tree census years.

A more precise analysis could be conducted by increasing the geographic precision of tree locations, either through latitude and longitude coordinates or LIDAR point cloud models, and it would likely be informative to statistically model the correlation between street tree population variance and the temporal fluctuation in the aforementioned social variables and/or housing prices.

This research and analysis should be useful for urban planners, who should be aware that changes to the physical environment will contribute to the evolving social character of that neighborhood.

You can find the code for this project and post it on GitHub and view the final visualizations here.

You can find me on LinkedIn.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Jerry Clayton
Jerry Clayton

Written by Jerry Clayton

A data scientist with a passion for social change.

Write a response