EDGI’s Environmental Enforcement Watch (EEW) has created a new data science tool for analyzing federal environmental data at the watershed level. Above is an example of output from EEW’s Jupyter Notebook: a map of facilities regulated under the Clean Water Act (CWA) in a Houston, Texas, area watershed. Facilities that have not been inspected for compliance with the CWA are represented by black dots and those that have are represented by orange circles (the size of which corresponds to the number of inspections).
By Eric Nost, Megan Raisle, Steve Hansen, Kelsey Breseman, Lourdes Vera, Sara Wylie, and EDGI
- We developed a data science tool for educators, journalists, and civic organizations interested in understanding water pollution and polluters in their communities
- We describe the rationale behind the tool and provide one example use of it
- We provide a shortened version of our hands-on guide to using our Watershed Notebook; to access the full guide, including two additional sample uses, please visit this link
Why We Created a Watershed Notebook
Who’s responsible for this mess?
As the intertwined climate-public health crisis grows, more and more people are confronting the harms wrought by fossil fuels and toxics industries in their neighborhoods. As hurricanes slam chlorine production plants, fires engulf chemical factories, and COVID kills a refinery’s neighbors, communities have asked: What is going on? Who’s responsible? What has the government done—or not done—to improve environmental health? How can they be held accountable?
Data is a central—if not always reliable—player in these questions. Tech industry hype focuses on how “big data” can reveal new insights and create new value for businesses or how “mining” it can produce “better” decision-making. The premise is that we already have the data we need—we just need to analyze it more incisively. At the same time, it is often suggested that this analysis should be done with the help of advanced and yet incredibly opaque artificial intelligence systems. Fanfare for big data also tends to ignore how existing data gaps, errors, and ambiguities continue to be reproduced and how it is often in some entities’ best interests for data to remain hidden or unmanageable.
In this context, the public—from educators and journalists, to civic organizations such as waterkeepers and environmental justice groups—often do their own accounting because they can’t trust what polluters themselves report or what the state says. Oftentimes, official data is incomplete or is measured in a way that overlooks community concerns and favors industry or government perspectives. While many such groups do incredible monitoring work, others don’t have this capacity. Their questions go unanswered and concerns unproven. And while monitoring overall air or water quality, or even the releases of specific polluters, is enormously valuable, it can’t tell us the whole story; it lacks important context: How much is this facility actually allowed to put into the environment? Has it ever gotten in trouble for adding too much? Is the government letting them off the hook?
Getting what data’s out there
Notwithstanding key gaps, errors, and ambiguities, data on government enforcement of and industry compliance with environmental protection laws exists and is publicly available in the US—in contrast to some countries, like Canada, where such information is even more haphazard. The challenge is to access and make sense of that data. Whether a community group wants to develop a pressure campaign or lawsuit against a facility, work collaboratively with polluters, or adopt some other strategy, the Environmental Data and Governance Initiative’s Environmental Enforcement Watch (EEW) can help.
Imagine this not-uncommon scenario: A waterkeeper group finds high concentrations of benzene in an area stream. Who might be the culprit? One place to start with this question is to determine which industrial facilities are actually permitted to release benzene.
EPA’s Enforcement and Compliance History Online (ECHO) compiles this information, but it doesn’t necessarily make it particularly easy to access and interpret. ECHO will display all the facilities releasing benzene in the zip code where the group made the measurements, but that’s not necessarily helpful when the actual source may be further upstream, outside of the zip code but still within the relevant watershed. EPA’s How’s My Waterway? more directly addresses the question by illustrating watershed health. It does not tell us much about whether polluting facilities are in compliance with the law, nor what pollutants they release into the water. Apart from telling us the overall health of our lakes, rivers, and streams, it does not synthesize an overall picture of enforcement and compliance. Perhaps by putting ECHO and How’s My Waterway together one could come close to figuring out what they want to know, but that’s inconvenient and does not provide a single authoritative answer that can be used in further engagement with industry or the EPA—instead, it creates potential uncertainties and gaps.
A Watershed Notebook
EEW’s new Watershed Notebook can help: A Notebook is shorthand for a Jupyter Notebook, which is a way to write and share Python computer programming code. Our Notebooks are accessible on a platform known as Google Colab, which allows you to access and visualize data without actually needing to do any coding!
Before you work through the Notebook on your own to discover trends in compliance with and the enforcement of environmental protection laws in a place you care about, you might want to watch the full demo of the tool we put together:
What follows below is a fuller description of how to use the tool towards accountability in watershed relations. We hope this tool helps educators, journalists, waterkeeper groups, and others who care for the waters of an area demonstrate—on state and industry’s own terms, with their own data—how environmental protection laws are not being complied with, enforced, and, possibly, insufficient to protect human health. You can follow along here in a pre-run Notebook or run a blank one yourself.
How Do I Use the Watershed Notebook?
To start: HUC it
Ok, you’ve got the Notebook open. The first piece of information you’ll need to enter is your zip code. (To be clear, our Notebook does not collect or store any kind of data you enter!) Not everyone knows what USGS (United States Geological Survey) Hydrologic Unit Code (HUC) they live in, but they probably know their five digit zip code. We can match zip codes, which are arbitrarily-defined administrative boundaries, with watershed boundaries—naturally defined areas linking together places where water drains to the same place. The Notebook returns all the watersheds that at least partially traverse your zip code.
Let’s look closely at the Buffalo region in New York. We’ll enter 14303 as our zip code, which covers part of the Niagara Falls region:
But watersheds are curious beasts. They come nested together in different sizes, a bit like Matryoshka dolls. HUCs with eight digits—like 18090203—cover more area than 10 digit ones. Each eight digit HUC will have several 10 digit ones nested within them. In the figure below, the Death Valley-Lower Amargosa HUC 8 watershed is broken into more than a dozen 10 digit watersheds, including 1809020303, Marble Canyon (yes, even deserts have watersheds! These describe how water moves across the landscape when it does appear.):
If you want to search for as wide of a range of facilities as possible, you might choose an eight digit HUC as your unit of analysis. If you need to narrow in on an area, you might start with the 10 digit HUC. You can always go back and change your selection, re-running the cells of code. For zip code 14303, let’s show facilities in the 10 digit HUCs that intersect with our zip code:
Get the Data
Now we are going to go get data about industrial facilities in the area. We utilize records from the EPA’s Environmental Compliance and History Online, or ECHO, database. EDGI keeps our own copy of this database that is updated every weekend by our partners at Stony Brook University.
Where exactly are these facilities though? How many fall within the zip code itself and how many are further upstream in the watershed? We can map them:
As you can see, the HUC 10 watershed that covers the 14303 zip code is pretty big—it encompasses both the US and Canadian side of the Niagara River, though because we are pulling data from the US EPA, only US facilities are shown. The number of facilities in each general area is shown in the circles. As we zoom in on the map, we start to see the actual locations of specific facilities:
Clicking on each orange circle will pull up the name of the facility in that location. Here, we see that the City of Niagara Falls has a wastewater treatment plant right next to American Falls, where the Niagara River goes over its escarpment:
After we get a sense of the kinds of facilities in the area, we probably then want to get more specific information about them—what are they permitted to put into streams and ultimately the Niagara River? Have they ever been fined for doing so? Is the US EPA taking any kind of enforcement actions against them? The next cell of code shows us the facilities that are most non-compliant in the area over the past 13 quarters, or a little over three years:
There are several reasons why a facility like the City of Niagara Falls’s wastewater treatment plant (as returned in the above results) can be non-compliant with the Clean Water Act, including having reported releasing too much of a regulated pollutant, which might include everything from e. coli to carcinogenic benzene and even temperature. (Pro-tip: if you want to see more than just 20 facilities ranked in terms of non-compliance, you can click in the cell and change 20 to some other number and then re-run it!)
Getting a sense of the worst-offending entities in the area could help us narrow in on who might be responsible for an unregulated release of benzene—after all, if these facilities are routinely non-compliant with the Clean Water Act, spending each of the past 13 quarters failing to live up to its requirements, then they are operating without impunity and are more likely to be doing things they shouldn’t.
So, let’s get more specific data from ECHO. We have a lot of options at our disposal:
We could look at whether regulators are inspecting facilities or whether they are issuing penalties. Given our focus on industry’s non-compliance with water quality protection rules, let’s look at effluent violations data. Effluent violations describe scenarios where a facility has reported discharging more effluent than it is legally permitted to do. It’s important to keep in mind that the Clean Water Act—and the National Pollutant Discharge Elimination System (NPDES) in particular—does not prohibit discharges into our lakes, rivers, and streams (despite the use of “elimination” in the name of the program!). Instead, CWA NPDES permits these discharges at certain levels.
Maps, charts, and tables—oh my!
We’ll start by showing a table of the effluent violations data for this watershed. We make our selection in this interface, which appears after we run the cell of code:
The above cell of code just produces the interface by which we can choose the data we want to look at and how we want to look at it—whether as a table, a chart, or on a map. The next cell of code is what actually goes to our database and gets the data, and then displays it. This can take some time (possibly several minutes) if we are trying to get some kinds of data, like reported effluent violations, for a large area like a HUC 8 or even a HUC 10.
This data gives us some clues for sure:
We can see a lot of “Non-receipt Violations” which means facilities are getting in trouble because they aren’t submitting their required water quality monitoring reports. Let’s save the data for download later, just to be safe. Now we can work with this table in Excel, which may be a bit more intuitive.
Whether in the Notebook or in Excel, it can be difficult to make much sense of the table. It would be quite useful if we want to look at the specific records of specific facilities, but it doesn’t really help us see patterns and trends, which might clue us in to which facility or facilities are likely to blame for the benzene release.
Let’s take a look then at some more easily digestible overviews of the data. If we go back to the previous cell (6a) and choose Chart (keeping the “Effluent Violations” data toggled), and then jump down again and re-run (6b), this time we’ll get a chart of violations over time, broken down by chemical:
As we see above, there are a lot of different chemicals, or “parameters”, that are released into this watershed, which makes the chart wonky. In fact, this bad data visualization design proves a point: The EPA permits the dumping of so many different kinds of chemicals. Either way, we can see that violations have dropped off over the past few years—this may be due either to less reporting or to better industrial practices (we could try to get a better sense of which it is by looking at other data sets).
The above chart doesn’t give us much sense of where in the watershed effluent violations are occurring—near where we sampled benzene? Upstream from it? If we return to 6a and choose Map, then re-run 6b, we can see facilities that have reported effluent violations. These are orange circles, sized to the number of violations reported. Those that have not reported violations are represented by small black circles:
Now that we have some sense of trends in how the Clean Water Act is complied with in this area, we can access the full record of polluters’ discharge monitoring reports and narrow in on our pollutant of interest, benzene:
After running cell 7a, we don’t need to select facilities, just pollutants/parameters. We want to find the ones that have reported discharging benzene. Just as before, with the effluent violations, we can show maps and charts to get the overview of the situation. This time though we are filtering our results to facilities discharging benzene, and returning all records of its release, not just releases that are above the legal limit:
What we see is that not every facility that submits discharge monitoring reports has reported releasing benzene; there are two or three clusters in the Buffalo-Niagara region where these discharges are reported—north of downtown Buffalo and east of downtown Niagara Falls:
Towards Other Meaningful Geographies
At EEW, we are continually asking ourselves and the organizations and people we work with: “What scale matters?” So far, this has led our team to summarize environmental enforcement at state, congressional district, zip code, and now watershed levels. But there are many other ways we could examine pollution and polluters, including in user-defined areas of interest. What if a zip code or watershed doesn’t adequately describe the area you’re concerned about? What if you wanted to draw the irregular boundaries of your neighborhood as you define it, to understand environmental health specifically in that place? We are continuing to work on supporting such a tool, so please stay tuned! In the meantime, run our Watershed Notebook, and be in touch with what you find, what doesn’t work, and if you would like to collaborate: firstname.lastname@example.org.