Google Summer of Code Ideas

This is the ideas page for EDGI’s Google Summer of Code (GSoC) program!

All newcomers to the project should review our GitHub project introduction to learn more about where and how we work. Our repositories can be found on our GitHub organization page.

Important Note–to learn how to contribute to EDGI software, see our Contributing Guidelines.

Contact Us

Most development communication happens on archivers.slack.com, where anyone can request an invite. Please introduce yourself in #gsoc rather than in #general, as we have a lot going on in the Slack team! Matt Price ( @titaniumbones | @mattprice) is the technical lead for the project.

For project-related topics, reach out on Slack — either in #gsoc or in the appropriate development channel — or file an issue in the appropriate Github repo.

For GSoC application inquiries, use the #gsoc channel.

Please note that our project has a weekly rhythm. The first half of the working week (Monday-Wednesday) is usually the best time for our core volunteers to think about new feature development and onboard new participants. Thursdays and Fridays we are often busy preparing for weekly events, and on weekends we are either supporting events or recovering from them. Please be patient and/or self-directed during our busy periods!

Contribute Improvements

We have preferred guidelines for submitting changes. Please read them over!

We also love it when students show that they are excited to work with us by looking at some of our first-timer issues in our web-monitoring or guides repos and even submitting a pull request! This gives us confidence that you’ve read our contribution guidelines and would be ready to jump into a project.

Submit a Proposal!

If you are interested in being a GSoC student with us, first read through our GSoC organization profile, join the #gsoc channel in our slack, and tell us what you’re interested in working on and a bit about your experience.

It’s also great to hear if you’ve forked one of our codebases and set up a development environment. Done with that? How about running tests? Tell us how far you’ve gotten!

The Student Application period opens March 20, 2017 and runs until April 3, 2017 12:00 (EDT). We have an application template which asks for details about you and your experience as well as a description of your proposed project.

A good project description :

  • has a clear summary of the goals your project is seeking to accomplish
  • indicates what need your project fills
  • describes how your project will do so
  • identifies milestones and a timeline to get there

Initial ideas can be discussed in #gsoc. Once your idea is more developed, please fork our overview GitHub repo and submit a pull request or PR to add your proposal to overview/gsoc. Please used the naming format proposal-<proposal-name>.md and use our application template for proposals.

We will provide feedback and comments on the proposal Pull Request, you can make changes up until the April 3, 2017 12:00 (EDT) deadline. We suggest you submit your ideas early! Any questions on the process will be answered in #gsoc.

Potential Ideas

Apply Machine Learning to Monitoring Website Changes

  • Part of: web-monitoring
  • Description: Help us improve our government agency website monitoring through the use of machine learning in order to reduce the amount of unnecessary review our analysts perform during computer-assisted identification of important changes.
  • Contact: #dev-webmonitoring on archivers.slack.com
  • Keywords: new features, machine learning, data analysis, visualization
  • Ideal Experience and Interest: Python, Ruby on Rails, Node.js, Machine Learning (in particular scikit-learn).

The web-monitoring project has had 4 months of development and interacts with the API from our partner organization to pull down and compare changes in versions of webpages on the domains we monitor. We are looking to move into supporting the analyst’s task of comparing and determining important changes through the application of machine learning (ML) models. This is a very new project which will evolve rapidly in March and April. Right now, the best way to begin is by:

Refine Event-based Preservation Application

  • Part of: archivers.space app and harvesting-tools
  • Description: We have a backlog of feature requests and a growing application to handle event-based preservation and data downloading. Features we’re keen for include improving our leaderboard so we can see how volunteers are doing at events, making our interface cleaner and easier to understand, and more.
  • Contact: #dev-archivers-space on archivers.slack.com
  • Keywords: new features, optimization, security, web application, visualization
  • Ideal Experience and Interest: JavaScript (Meteor, Node.js), Heroku, AWS, MongoDB

Our pipeline app and harvesting-tools are the most mature of our projects, but have a fair bit of refining before they are really stable. Also– we’ve got ongoing research and questions around best practices in web security. We have short cycles of development, use, feedback, and refinement so if you are interested in learning more about web application security and development in an agile-inspired development process this might be for you!

Propose your own!