Countless historic settings have demanded archiving as a politically urgent tactic. In Nazi Germany individuals risked their lives smuggling suitcases of scholarly documents out of the country and into archives elsewhere, which is why we can now access the collected works of Edmund Husserl and Walter Benjamin. Archiving can also be a communal act of political visibility. The Mazer Lesbian Archive accumulated during the 80s in a house in Altadena, a neighborhood of Los Angeles, by the work of volunteers dedicated to documenting largely unseen, unheard lesbian society. All of these elusive collections wound their way into institutions – Husserl’s at the University of Louvain, Benjamin’s at the Biblioteque Nationale, and Mazer at UCLA starting in 2009.
With DataRefuge, we see a different trajectory. Institutionally centralized scientific data, documents, and webpages are currently being dispersed into an international patchwork of nonprofit and for-profit repositories, using a coordinated array of largely in-situ archival tactics of web scraping, mirroring, and data harvesting. Certain information on federal websites such as the EPA’s have already vanished from public view – though copies remain on the Internet Archive – and while we wait to see whether anything else will disappear, this preemptive, federated guerrilla archiving work continues in order to outpace any further changes.
At UCLA, we became one prong of this ongoing effort. With a crew of four core organizers, a modest budget, and about a dozen other volunteers, we scheduled a panel and set of workshops for January 20th, 2017. To get oriented we reached out to Michelle Murphy at the University of Toronto, then Laurie Allen and Bethany Wiggin at the University of Pennsylvania, who generously shared the lessons of the work they and their affiliates had already done to launch EDGI and DataRefuge. Mike Hucka, a researcher at Cal Tech concerned about the long-term vulnerabilities of digital content, attended DataRescue Philly at U Penn held on January 13th and came back with a well-rounded understanding of the logistical and technical steps involved. We would spend Donald Trump’s inauguration day ‘rescuing’ scientific information the new president flatly denies.
The 20th brought hard rain (by LA standards) and over 60 participants to UCLA’s Information School for the DataRescue. The morning began with panelists making an appeal to think longitudinally and save everything as a default mode (the operational mantra of the Internet Archive). Hard-won data gathered during storms from the sides of a doused, pitching boat – as shown in vivid photos by panelist Steve Diggs from the Scripps Institute of Oceanography at UC San Diego – need long-term care that, according to Diggs, “outlive opinions and hypotheses.” Professor Christine Borgman, of UCLA Information Studies, warned that “If we don’t have data, we don’t have a problem,” since part of the significance of research data lies in its ability to serve as evidence of complex, sometimes intangible dilemmas. Jason Scott of the Internet Archive described the neglect of archiving and preserving information as a barrier to adequately processing the present time – he advocated for the save-everything method as a key expression of activist preservation.
Speaking last, Joan Donovan, from UCLA’s Center of Society and Genetics, introduced the idea of decentralization of data as an activist practice and protest tactic, a specific form of scientific activism. Though she pointed out potential pitfalls of data decentralization done without proper accountability and constant vigilance, Donovan argued that it could provide an opportunity to organize science more effectively. Activist tenets should remain a guiding force in data decentralization efforts such as DataRefuge, in order to keep the fight against post-truth agendas and the deconstruction of ‘facts’.
For the rest of the day participants settled into one of four workshops. Most – around forty or so – attended the Archive-a-thon group, where facilitators introduced the Internet Archive nomination tool. The group focused on nominating websites from the Department of Energy, using documents provided through DataRefuge and EDGI resources. Lead facilitator Mike Hucka estimated that the participants of the workshop nominated about four pages per minute, ultimately nominating – or ‘seeding’ – 568 unique URLs. A number of people also experimented with scraping and were able to upload an uncrawlable dataset to the CKAN repository.
Other workshops were more conversational. “Best Practices for Archiving Scientific Data” discussed the current archiving practices and affordances of the Internet Archive, pinpointing issues in accessibility and usability of the data stored there. The group analyzed specific search functions and metadata structures and opened up a path for future opportunities in planning a more effective metadata structure for the archive. Another workshop, “Protecting Climate Data Over the Long Haul,” focused on producing a research agenda for future climate data protection efforts. The group also began a bibliography of relevant texts and resources to be used in future research on climate change data infrastructures.
“Citizen Data Advocacy, or An Intervention Toolkit for Those Who Care About Facts and Data” planned the components of a toolkit that interested citizens could use to get involved in the data protection movement in various ways, such as applying administrative pressure through letter writing and calling campaigns, and generally communicating concerns about environmental issues with local and federal governmental staff and institutions. The Toolkit is now available on our website, and we’ll be using it to design pamphlets to hand out at future events.
The guerrilla archiving tactics offered to us by the folks at U Penn, U Toronto, and elsewhere continue to ignite events around the country (just as Trump’s latest executive order rapidly converted airports into protest zones) – just today there are seven upcoming DataRescues listed, and another at Berkeley in the works. Our own event generated ample publicity and momentum from participants and others eager to help. Currently, we’re in talks to orchestrate another DataRescue at the UCLA Community Schools and have plans with the UCLA library to conduct half-day workshops on the basics of web archiving, still focusing on climate change data. We hope to work with EDGI to craft a longer-term agenda that can carry on over the next four years of what many expect to be an ongoing struggle between ‘alternative facts’ and the established rigors of scientific work, paired with the long-term preservation required to maintain the social value of research on important environmental questions that affect us all.
We thank everyone who helped us with our UCLA-branch of this federated effort and look forward to more coordinated, decentralized, civic-minded, activist-spirited archival work as we move forward.