EDGI at Web Archiving Week!

Three members of EDGI attended Web Archiving Week, #waweek2017, held in London from June 12 – 16, 2017. Located at the Senate House, University of London and the British Library Knowledge Centre, the week consisted of three events: Archives Unleashed 4.0 Datathon, International Internet Preservation Consortium (IIPC) Web Archiving Conference (WAC) 2017, and RESAW (a Research Infrastructure for the Study of Archived Web Materials).

Archives Unleashed

Maya Anjur-Dietrich and Dawn Walker participated in the Archives Unleashed (AU) datathon, the fourth in a series of events which aim to explore web archiving research tools and foster collaboration around future directions in web archive analysis. Over the two days, participants worked on datathon projects that were presented at the end of AU and during the first day of WAC!

If that wasn’t enough, lightning talks were interspersed throughout! Maya and Dawn’s highlighted the connection between the Web Monitoring working groups work to traditional archives:

Web Archiving Conference and RESAW

Brendan O’Brien joined for the remaining conference, and some collective highlights are documented below:

  • Leah A. Lievrouw‘s (@Leah53) keynote covered Internet and Web history, identifying a shift in objects of interest for archival studies in this current big data era
  • Lozana Rossenova (@LozanaRossenova) and Ilya Kreymer (@IlyaKreymer) from Rhizome talked about Web Recorder and remote browsers value for archivists and artists. They also focused on their recent experiments "patching" across multiple sources (WARCs) and the live web to pull in missing assets
  • Anastasia Aizman & Matt Phillips from the Harvard’s Library Innovation Lab discussed instruments for web archive comparison in Perma.cc, focusing on image and HTML comparison, as well as which similarity measures to use. Their research included investigations into the applicability of SimHash & MinHash for comparing WARCs
  • Nicholas Taylor updated on the status of the LOCKSS (Lots Of Copies Keep Stuff Safe) re-architecture, converting a monolithic project into a collection of microservices
  • Mat Kelly (@machawk1), PhD Candidate from Old Dominion, and David Dias (@daviddias), Software Engineer at Protocol Labs, presented updates to InterPlanetary Wayback (ipwb), a personal web archive system that relies upon decentralized web file system, IPFS, and that started as a project of Sawood Alam (@ibnesayeed) and Mat’s at a previous Archives Unleashed! The session included an excellent discussion on content-addressing & its potential benefits for web archives
  • Jefferson Bailey (@jefferson_bail) at Internet Archive (IA) discussed current holdings, projects focusing on facilitating better access (e.g., through WASAPI (Web Archiving Systems APIS)) and future initiatives

All together was a great opportunity to connect with the Web Archive community of practitioners and researchers! Potential future collaborations and many parallel projects were identified around issues our Archiving work has been tackling since December.