Google Summer of Code

EDGI is grateful to have participated in Google Summer of Code (GSoC) for the first time in 2017. GSoC provides an opportunity for students to contribute to open source projects while getting paid! We had a mentoring team of six volunteer EDGI members who worked with two students that contributed to our Archiving and Website Monitoring projects over the course of the summer.

Check below for recaps on our student's projects!

Data Visualization of Archived Content – Harsh Baid

Harsh Baid profile photoHarsh Baid is a junior at the University of Maryland - College Park studying Computer Science with a minor in math. He started programming during his freshmen year of highschool learning front/backend web development, but is currently interested in working with low-level architecture and embedded systems. In his free time, aside from his engineering-related hobbies, he loves to play the ukulele, read, and sleep.

The purpose of this project is to develop a set of visually impactful and meaningful models that allow users to easily visualize the changes archived by EDGI.

Read the final project update from Harsh:

My project focused on data visualization. The data being archived, while vital for preservation, was meaningless in the hands of the average individual and a bit overwhelming. So, over the summer, I worked on interactive graphs and models that helped users understand the general overview of the data. Using D3, I created a Coverage Map and a DataRescue Map. Continue Reading

Mid-project update from Harsh:

I have been focused on creating and refining several visually impactful models for EDGI and its related organizations. The two visualizations I've been primarily focused on are, 1) being able to accurately convey archival rate of coverage data using a sunburst sequence, and 2) creating an interactive choropleth map of the United States to showcase DataRescue and other related EDGI events.
I am currently in the process of refining and integrating both of these models on their respective websites based upon community feedback.


Identifying and Prioritizing Important Website Changes – Janak Raj Chadha

 profile photoJanak Raj Chadha is an undergraduate student at a National Institute of Technology in India where he studies Electrical Engineering. He is interested in Artificial Intelligence, Space Exploration, Electric Vehicles, and Renewable Energy Science. He has recently started contributing to open-source organisations and would like to contribute to libraries like Tensorflow in the future.


The project goal is to utilize machine learning algorithms to identify changes on government agency websites which are worth reviewing.

Read the final project update from Janak:

The website monitoring project involves working with data from multiple sources. The three primary sources are Versionista, PageFreezer, and Internet Archive. An important initial task was researching and creating clear documentation of the differences in data formats. Continue reading

Mid-project update from Janak:

I’m delighted to share my experience so far. It has been an exciting and challenging ride for me and I’ve enjoyed it thoroughly.

I took some time to understand and also document the variety of data sources that we’re using. I’m excited to have made progress on filters for insignificant changes which required me to go through the changes myself and I can really appreciate the effort put in by the analyst team. I’m currently working on adding and improving functionality for computing changes between versions. As we move ahead, I’m working alongside the analyst team to identify and prioritize important changes. It has been an amazing experience so far and I’m extremely thankful to everyone who helped me through this journey.