Blog

How EDGI’s Website Monitoring Team Uses an Open Source Tool to Identify Website Changes—and How You Can, Too

By Alejandro Paz and Gretchen Gehrke

Website monitoring is critical for raising public awareness about federal management of public information. In 2017, EDGI’s website monitoring software development team developed a suite of tools to identify and visualize changes between different versions of a single webpage. In 2019, our partner the Internet Archive integrated EDGI’s open source comparison software into the Wayback Machine as a publicly available feature, “Changes.” This feature allows the public to see the differences between any two versions of the 525 billion webpages the Wayback Machine has stored. 

The easiest way to find the Changes feature is to go to the Wayback Machine homepage, https://archive.org/web/. From there, type in the URL of interest to you in the top, center text field, and click “Browse History.” From this page showing all of the snapshots of that webpage that the Wayback Machine has stored, click on the “Changes” link between “Collections” and “Summary,” below the text field where you typed the URL and above the calendar of snapshots. You can also navigate to the Changes feature from a specific memento or a specific snapshot of a webpage. From the memento page, click near the upper left corner on the link displaying the number of captures the Wayback Machine has. This will take you to the page that shows all the dates of snapshots of that webpage, and you can click on the “Changes” link as described above. 

The “Changes” page will contain one or more calendar grids in which each date is represented by a square. Shaded squares in these grids represent dates when a snapshot of the webpage was taken by the Wayback Machine. The shading of each square indicates the relative difference between versions, with the initial page indicating the degree of difference between older snapshots and the most recent snapshot. To visualize specific differences between two versions of the same webpage, first click the date of the older snapshot of interest. Once you select this date, the shading of the rest of the calendar grids will change to indicate the relative difference between snapshots on those dates and the first snapshot selected. Click the date of the newer snapshot of interest. If you don’t have specific dates in mind, try more darkly shaded dates first. Once you have the two snapshots selected (and to reiterate, it is very helpful to click the older snapshot first and the newer snapshot second), click “Compare” at the top center.

Clicking “Compare” will bring up a side-by-side rendering of the two different snapshots of the same webpage. If the versions are different, those differences will be highlighted. The left-hand view will show “deletions” highlighted in yellow; the right-hand side will show “additions” highlighted in blue. Note that the yellow highlights only actually represent deletions and the blue highlights only actually represent additions if the left-hand snapshot is the older version, and the right-hand snapshot is the newer version. This is why it is important to select the older version first. If you mistakenly select a newer version first, you can either navigate backward using your browser’s back arrow, or you can iteratively select dates until the older version lines up on the left side and the newer version on the right side. 

Using this “Changes” feature, you can see changes federal agencies make to their webpages. This feature shows the record of when pertinent links were added or removed, or when descriptive language was changed. For example, take a look at this webpage from the Department of Transportation website. Notice how information about climate change was removed and replaced with overtures about sustainability. “Changes” is a powerful tool that reveals the ways in which webpages, and the information they deliver, are changing.