Suddenly, @textfiles was in my ears!
(Yes, I'm listening to 5 years old podcast episodes.)
The Lost Cities of Geo - 99% Invisible
https://99percentinvisible.org/episode/the-lost-cities-of-geo/
#webArchive #culturalheritage #webHeritage

Suddenly, @textfiles was in my ears!
(Yes, I'm listening to 5 years old podcast episodes.)
The Lost Cities of Geo - 99% Invisible
https://99percentinvisible.org/episode/the-lost-cities-of-geo/
#webArchive #culturalheritage #webHeritage
Kunnen we niet afspreken dat gen naar media altijd via bijvoorbeeld https://archive.ph/ doen, zodat iedereen het artikel kan lezen?
#BDCAM25 Ian Milligan presented several examples of #WebArchive analysis (when working with archives as data):
* Gain Insight into Recent History
* Expand Questions to a Global Scale
* Track Changes to Phenomena over Time
* Map a Network of Actors
as well as #DigitalHumanities methods & tools:
* Natural language processing (NLP)
* Binary analysis
* Network analysis
* AI analysis
So I used to use archive .is or .ph to capture archives of various news sites and stuff faster and more reliably than the Internet Archive tends to operate (it can take them days to archive a site and they seem to miss quite a few things on some news sites entirely.) But it seems to have either been hacked or just generally turned out to be bad because it's now redirecting to RT.
Is there anything else I can use?
Not sure how to tag this.
#WebArchive #SiteArchives #WebArchives #NewsSites
The Internet archive at https://web.archive.org is not working properly for me right now. Is this a local glitch? Is anyone confirming the issue?
All ex - Fed.Gov
Meidas_Charise Lee
@charise_lee
This is important Not all heroes wear capes
The #DigitalPurge is happening.
We have to start making backups of the US infrastructure and the US-based sources of information, like #webarchive. We better do that fast.
I am creating #PriorArt in technology here. What's happening in the US right now under #Trump #Musk #Spacekaren is nothing less than
#DigitalPurge,
#DigitalBookBurning,
#DigitalIconoclasm #DigitalBurningOfBooks #DigitalDestructionOfEvidence
@S4F
#FLOGW #flgw
I'm wondering - my blog about poetry and translation is bilingual. I made a home page instead of a sub-domain for the language and library ressource part but I think it might be the wrong choice. Would it be better to make it a semi-seperated website with a sub-domain or to integrate it on the blog concept? It's hard to link both despite the link linked. Help
#boostswelcome #websiteoptimization #blog #website #websitetips
#webarchiving #Webarchive
#websitebuildingtips #websitetips
Of the institutions and individuals leaving X now, have any declared an archive strategy for their posts?
Web archiving by @internetarchive does not work so well for these commercial and dynamic interfaces. So self-archiving would be essential (due to GDPR it's easy)
I hear almost nothing about this.
Open Library
A tool to search for free books in the Internet Archive.
Search by title, author, subject, place, publisher and full text.
openlibrary.org
What's happening to archive.org? Is it the end of an era? Or, a sign that someone should put up an archive of Web Archive somewhere?
#Business #Announcements
Google Search introduces access to archived webpages · The search results now include links to the Wayback Machine https://ilo.im/1602ru
_____
#WebArchive #InternetArchive #WaybackMachine #Google #SearchEngine #Webpage #Website #DigitalPreservation
Of related interest for all the data, media, infrastructures, publics, and history people at #EASST4S2024: Please consider contributing to next year's THE DATAFIED WEB at Siegen University. Our call for contributions is open until October 15: https://easychair.org/cfp/RESAW2025. Boosts are highly welcome.
Check out this new resource from Library of Congress - includes #webarchive datasets! https://data.labs.loc.gov/ we hope to publish more via this site so bookmark it!
https://cohost.org/arborelia/post/4968198-the-software-heritag
This is a serious discussion about digital archiving and data immutability.
How should we deal with a #webarchive where immutability is inherent in the technological design?
(I am not able to speak on the subject of transphobia)
Die Empfehlungen im Bereich "Für dich" der #AuroraStore App bilden definitiv nicht meine Präferenzen ab. Es sind alles #Apps zentraler #Plattformen und Anbieter ohne deren Inhalte bzw. Dienstleistungen ich gut leben kann.
Ich hoffe eines Tages sind diese #Produktnamen nur noch im #WebArchive auffindbar und die Menschen haben gelernt #FreieSoftware und #DezentraleSozialeNetzwerke
und #OffeneStandards
zu verwenden und weiter zu entwickeln.
@internetarchive is introducing #ARCHWay, a free tier available to individuals: "With ARCHWay, any individual can begin exploring #WebArchive collections computationally using ARCH – analyzing, generating, and publishing research data from digital collections" https://archive-it.org/post/introducing-archway/
Monolith; Archivematica
Just two resources, today, as $WORK
and #2.1 have encroached on the cycles usually reserved for personal research this week.
I spend an inordinate amount of time archiving and preserving content from the web and other places. Not just for these Drops, but also for the work I do fighting the good fight in cyber and against those who seek to dismantle liberal democracy and harm others. The two tools in today’s drop make that work a bit easier (though there is a bit of tedium in the second tool as it is a more “official” hoarding platform).
If you need/want to preserve content from the web, or archive other digital assets, read on!
TL;DR
This is an AI-generated summary of today’s Drop.
Monolith
One of the more compelling features of both R Markdown and Quarto HTML documents is the ability to create an entirely self-contained HTML file. While I wouldn’t use said file in a production hosting capacity (they can be yuge), they are super handy for shipping interactive reports around.
What if you could do that for any HTML page from the command line?
We’re not talking about generating a WARC archive, or (now deprecated) WebKit .webarchive
. This is a fully standalone and functional single HTML file.
Well, we can, with monolith (GH), a simple and efficient Rust-based CLI tool for embedding all the things necessary to reproduce a web page into a single HTML file (i.e., give it a URL as an input, and it will output a single HTML file that faithfully reproduces the original web page, including all its assets like CSS, JavaScript, and images). This means you get a fully interactive page, not just a static screenshot or janky PDF. It’s like having the entire web page in your pocket, available anytime, anywhere, even without an internet connection.
Monolith is not just a simple web scraper. It’s a sophisticated tool that has evolved over time, with features added in response to community input. For instance, it supports a wide range of charsets aside from UTF-8, and it has an option for saving a document using custom encoding. It also can process and embed the contents of <noscript>
tags, and it can enforce the saved document’s charset to always be set to UTF-8.
You can even use it with Chromium/Thorium to capture the state and resources of a dynamically loaded page.
The repo has extensive installation and usage examples, so I’ll leave you in their hands for that. And, the section header is a partial capture of the HTML generated by Monolith archiving rud.is/b
.
Archivematica
Archivematica (GH) is an open-source digital preservation system that’s a bit like a sophisticated time capsule for all sorts of digital content: documents, photos, videos, etc. This tool takes these files and processes them so that they’re preserved in a way that meets international standards.
Along with faithfully storing the original content, Archivematica transforms them into formats that are less likely to become obsolete, making sure that future generations can still access them. This process involves creating Archival Information Packages (AIPs) and Dissemination Information Packages (DIPs) from Submission Information Packages (SIPs). These packages are like the DNA of digital preservation, ensuring that all the necessary information for future access and understanding is bundled up neatly.
The utility of Archivematica can’t be overstated. Technology changes at breakneck speed, and the risk of digital files becoming unreadable is very, very real. Archivematica mitigates this risk by adhering to the Open Archival Information System (OAIS) reference model, which is the preeminent standard for preserving digital information. By adhering to this model, this tool ensures that the digital content remains accessible, no matter what new technology comes along.
For those who work in libraries, archives, or any institution with a digital collection, Archivematica is a game-changer. I’d argue it’s also a great tool for those of us who are trying to salvage the last vestiges of liberal democracy, as it enables us to precisely and accurately preserve history. Plus, being open-source means that the code is freely available for anyone to study, modify, and improve. This transparency is crucial for institutions that want to show stakeholders exactly how they’re preserving cultural heritage materials.
The target environment is Linux, but it is container-friendly.
FIN
Remember, you can follow and interact with the full text of The Daily Drop’s free posts on Mastodon via @dailydrop.hrbrmstr.dev@dailydrop.hrbrmstr.dev
https://dailydrop.hrbrmstr.dev/2024/01/10/drop-402-2024-01-10-hoarding-can-be-a-good-thing/
Diese Woche widmen wir uns im #DigitalHistoryOFK gemeinsam mit Annabel Walz (Friedrich-Ebert-Stiftung) dem komplexen Thema der Webarchivierung. Aus gedächtnisinstitutioneller Perspektive wird sie die Eigenschaften von #borndigital & #reborndigital Quellen, aber auch Best Practices für ihre Archivierung diskutieren, die auf #WebCrawling als Praktik & #WARC als Speicherformat setzen.
Mi, 29. Nov., 4-6 pm - via Zoom