Google Summer of Code – Environmental Data and Governance Initiative

Data Visualization of Archived Content – Harsh Baid

Harsh Baid is a junior at the University of Maryland - College Park studying Computer Science with a minor in math. He started programming during his freshmen year of highschool learning front/backend web development, but is currently interested in working with low-level architecture and embedded systems. In his free time, aside from his engineering-related hobbies, he loves to play the ukulele, read, and sleep.

harshbaid.com
Blackglade

The purpose of this project is to develop a set of visually impactful and meaningful models that allow users to easily visualize the changes archived by EDGI.

Read the final project update from Harsh:

My project focused on data visualization. The data being archived, while vital for preservation, was meaningless in the hands of the average individual and a bit overwhelming. So, over the summer, I worked on interactive graphs and models that helped users understand the general overview of the data. Using D3, I created a Coverage Map and a DataRescue Map. Continue Reading

Mid-project update from Harsh:

I have been focused on creating and refining several visually impactful models for EDGI and its related organizations. The two visualizations I've been primarily focused on are, 1) being able to accurately convey archival rate of coverage data using a sunburst sequence, and 2) creating an interactive choropleth map of the United States to showcase DataRescue and other related EDGI events.
I am currently in the process of refining and integrating both of these models on their respective websites based upon community feedback.

The project goal is to utilize machine learning algorithms to identify changes on government agency websites which are worth reviewing.

Read the final project update from Janak:

The website monitoring project involves working with data from multiple sources. The three primary sources are Versionista, PageFreezer, and Internet Archive. An important initial task was researching and creating clear documentation of the differences in data formats. Continue reading

Mid-project update from Janak:

I’m delighted to share my experience so far. It has been an exciting and challenging ride for me and I’ve enjoyed it thoroughly.

I took some time to understand and also document the variety of data sources that we’re using. I’m excited to have made progress on filters for insignificant changes which required me to go through the changes myself and I can really appreciate the effort put in by the analyst team. I’m currently working on adding and improving functionality for computing changes between versions. As we move ahead, I’m working alongside the analyst team to identify and prioritize important changes. It has been an amazing experience so far and I’m extremely thankful to everyone who helped me through this journey.

Data Visualization of Archived Content – Harsh Baid

Identifying and Prioritizing Important Website Changes – Janak Raj Chadha