toad.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
Mastodon server operated by David Troy, a tech pioneer and investigative journalist addressing threats to democracy. Thoughtful participation and discussion welcome.

Administered by:

Server stats:

218
active users

#webarchives

0 posts0 participants0 posts today

British Library: RESAW 2025: Report from UK Web Archive Colleagues. “The RESAW (Research Infrastructure for the Study of Archived Web) 2025 conference took place at the University of Siegen in Germany. It was organized by the Collaborative Research Centre 1187 ‘Media of Cooperation’ at the University of Siegen in cooperation with the Centre for Contemporary and Digital History (C²DH) at the […]

https://rbfirehose.com/2025/06/24/resaw-2025-report-from-uk-web-archive-colleagues-british-library/

ResearchBuzz: Firehose | Individual posts from ResearchBuzz · RESAW 2025: Report from UK Web Archive Colleagues (British Library) | ResearchBuzz: Firehose
More from ResearchBuzz: Firehose

NEW Journal Article: "Retrieval-Augmented Generation of Event Collections from Web Archives and the Live Web" #AI #RAG #webarchives @internetarchive link.springer.com/article/10.1

SpringerLinkRetrieval-Augmented Generation of Event Collections from Web Archives and the Live Web - International Journal on Digital LibrariesCreating collections of societally impactful events is a challenging task given the sheer amount of information about such events covering a large variety of aspects and perspectives in web archives and the live web. The automatic creation of such collections from web archives typically does not live up to the high standards of web archivists, who put lots of manual effort into carefully curating collections. Furthermore, the lack of engaging presentation methods sets up a burden for any users aiming to interact effectively with event collections in order to explore an event in its entirety. Therefore, we (i) conduct expert interviews to determine the requirements for building and utilising event collections from the perspectives of web archivists, (ii) introduce EventExplorer – a retrieval-augmented generation (RAG) approach to create event collections through efficient retrieval and diversified ranking – and make it available in an interactive web system, (iii) apply EventExplorer on different sources including a web archive and the live web, (iv) discuss which requirements are met by EventExplorer as well as the challenges that remain for future work, with a specific emphasis on the distinctive characteristics of both archived web and the live web environments. We demonstrate the effectiveness of EventExplorer applied on web archives through a user study of our interactive system. Then, we transfer our lessons learned to the live web by creating event collections of 166 elections in Europe. Our evaluation results show the effectiveness of EventExplorer in addressing the requirements identified in our expert interviews. Further, we derive a set of challenges and potential future steps for bringing together the automatic creation of web archive collections and manual curation. Finally, we discuss how to make web archives ready for their use in RAG systems.

Library of Congress: Preserving a History of Digital Mapmaking: Inside the Geospatial Software and File Formats Documentation Web Archive. “In this interview, Tim St. Onge and Meagan Snow explain how web archiving is preserving documentation essential to understanding the evolution of modern cartography. They outline the motivations behind the Geospatial Software and File Formats […]

https://rbfirehose.com/2025/06/20/preserving-a-history-of-digital-mapmaking-inside-the-geospatial-software-and-file-formats-documentation-web-archive-library-of-congress/

ResearchBuzz: Firehose | Individual posts from ResearchBuzz · Preserving a History of Digital Mapmaking: Inside the Geospatial Software and File Formats Documentation Web Archive (Library of Congress) | ResearchBuzz: Firehose
More from ResearchBuzz: Firehose

Alex Chan: Building a personal archive of the web, the slow way . “I’ve worked on web archives in a professional setting, but this one is strictly personal. This gives me more freedom to make different decisions and trade-offs. I can focus on the pages I care about, spend more time on quality control, and delete parts of a page I don’t need – without worrying about institutional […]

https://rbfirehose.com/2025/05/21/alex-chan-building-a-personal-archive-of-the-web-the-slow-way/

ResearchBuzz: Firehose | Individual posts from ResearchBuzz · Alex Chan: Building a personal archive of the web, the slow way | ResearchBuzz: Firehose
More from ResearchBuzz: Firehose

British Library UK Web Archive Blog: IIPC Web Archiving Conference 2025: Report from UK Web Archive Colleagues. “This year’s IIPC General Assembly and Web Archiving Conference took place at the National Library of Norway in Oslo. Many UK Web Archive colleagues from Bodleian Libraries, the British Library, Cambridge University Library and National Library of Scotland attended the Web […]

https://rbfirehose.com/2025/05/16/iipc-web-archiving-conference-2025-report-from-uk-web-archive-colleagues-british-library-uk-web-archive-blog/

ResearchBuzz: Firehose | Individual posts from ResearchBuzz · IIPC Web Archiving Conference 2025: Report from UK Web Archive Colleagues (British Library UK Web Archive Blog) | ResearchBuzz: Firehose
More from ResearchBuzz: Firehose

Colleagues from the Danish Royal Library showcasing the strengths of #SolrWayback, a powerful web application for searching and exploring data in #WebArchives by using memes about Trump and Greenland in their #BDCAM25 presentation
SolrWayback offers network analysis of domains, powerful visualization tools like N-gram, search by location on a map using EXIF metadata information in images, and, not least, the possibility to export data and derivatives from search results. github.com/netarchivesuite/sol

So I used to use archive .is or .ph to capture archives of various news sites and stuff faster and more reliably than the Internet Archive tends to operate (it can take them days to archive a site and they seem to miss quite a few things on some news sites entirely.) But it seems to have either been hacked or just generally turned out to be bad because it's now redirecting to RT.

Is there anything else I can use?

Not sure how to tag this.
#WebArchive #SiteArchives #WebArchives #NewsSites

🚨New article out in Archival Science!🚨

"Conceptualizing aggregate-level description in web archives" looks at the multiple systems (from database entities to domain names) that categorize and structure born-networked records, and how archival theory can/should consider these as representational architectures that warrant their own kinds of description. #webarchiving #webarchives

Read here! open access🔓doi.org/10.1007/s10502-025-094

SpringerLinkConceptualizing aggregate-level description in web archives - Archival ScienceWeb archives collections are often excluded from archival science discussions, and their description instead focuses on bibliographic approaches to item-level metadata. This article argues that web archives are best understood using approaches of archival description, focusing on a case study of the Danish Netarchive, a long-running national web archive. By capturing and preserving web sites for the purposes of legal deposit, the Netarchive creates and maintains historical records of the web. Examining the Netarchive’s systems and activities through the lens of archival representation, this article develops a typology of representational artifacts that support this work, including the use of database entities, wiki documentation, classification and management via Jira issues, and codes, identifiers, and structures embedded in network protocols themselves. The analysis considers how meaningful aggregations can be understood via these representational schemes, systems and architectures, and how the nature of born-networked records challenges concepts of singular, hierarchical orderings of records aggregations. The closing discussion proposes new modes of description that address these multiple interconnected systems, and raises questions about what this might mean for aggregate-level description in the context of digital and born-networked records more broadly.