toad.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
Mastodon server operated by David Troy, a tech pioneer and investigative journalist addressing threats to democracy. Thoughtful participation and discussion welcome.

Administered by:

Server stats:

273
active users

#warc

0 posts0 participants0 posts today
Neustradamus :xmpp: :linux:<p><a href="https://mastodon.social/tags/libarchive" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>libarchive</span></a> 3.7.9 has been released (<a href="https://mastodon.social/tags/MultiFormatArchive" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>MultiFormatArchive</span></a> / <a href="https://mastodon.social/tags/CompressionLibrary" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>CompressionLibrary</span></a> / <a href="https://mastodon.social/tags/FileArchiver" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>FileArchiver</span></a> / <a href="https://mastodon.social/tags/DataCompression" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DataCompression</span></a> / <a href="https://mastodon.social/tags/7Zip" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>7Zip</span></a> / <a href="https://mastodon.social/tags/7z" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>7z</span></a> / <a href="https://mastodon.social/tags/RAR" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>RAR</span></a> / <a href="https://mastodon.social/tags/ZIP" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ZIP</span></a> / <a href="https://mastodon.social/tags/GZip" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>GZip</span></a> / <a href="https://mastodon.social/tags/TAR" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>TAR</span></a> / <a href="https://mastodon.social/tags/XAR" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>XAR</span></a> / <a href="https://mastodon.social/tags/WARC" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>WARC</span></a> / <a href="https://mastodon.social/tags/BZIP2" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>BZIP2</span></a> / <a href="https://mastodon.social/tags/XZ" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>XZ</span></a>) <a href="https://www.libarchive.org/" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://www.</span><span class="">libarchive.org/</span><span class="invisible"></span></a></p>
infoDOCKET<p>Preprint: “Web Archives <a href="https://newsie.social/tags/Metadata" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Metadata</span></a> Generation with GPT-4o: Challenges and Insights” <a href="https://www.infodocket.com/2024/11/11/preprint-web-archives-metadata-generation-with-gpt-4o-challenges-and-insights/" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://www.</span><span class="ellipsis">infodocket.com/2024/11/11/prep</span><span class="invisible">rint-web-archives-metadata-generation-with-gpt-4o-challenges-and-insights/</span></a> <a href="https://newsie.social/tags/ai" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ai</span></a> <a href="https://newsie.social/tags/warc" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>warc</span></a> <a href="https://newsie.social/tags/webarchives" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>webarchives</span></a></p>
Peter Binkley<p>I'm seeing the trade-off between citability and comprehensiveness in this approach. <a href="https://code4lib.social/tags/WARC" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>WARC</span></a>-GPT builds a bunch of embeddings ("3624 embeddings from 1296 HTML/PDF records" in this case). and uses them (have I got this right?) to do a first pass, figuring out which files are good matches to a query. It then packages my query and the best file as a query to an <a href="https://code4lib.social/tags/LLM" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>LLM</span></a> (Mistral:latest running in <a href="https://code4lib.social/tags/Ollama" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Ollama</span></a> in this case), and shows me the result. So the answer is quite good for info in the file it selected.</p>
Kiwix<p>Quicker, better, robuster,... this is ZIMit 2.0! Our scraper able to make an offline version of any Web site is only a few days away from its release! Stay tuned! <a href="https://github.com/openzim/zimit" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="">github.com/openzim/zimit</span><span class="invisible"></span></a> <a href="https://mastodon.social/tags/webscraping" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>webscraping</span></a> <a href="https://mastodon.social/tags/webarchiving" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>webarchiving</span></a> <a href="https://mastodon.social/tags/zim" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>zim</span></a> <a href="https://mastodon.social/tags/offline" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>offline</span></a> <a href="https://mastodon.social/tags/kiwix" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>kiwix</span></a> <a href="https://mastodon.social/tags/warc" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>warc</span></a></p>
me·ta·phil, der<p>Wow! <a href="https://chaos.social/tags/TIL" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>TIL</span></a> about <a href="https://chaos.social/tags/ArchiveBox" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ArchiveBox</span></a>, your <a href="https://chaos.social/tags/selfhosted" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>selfhosted</span></a> <a href="https://chaos.social/tags/alternativeTo" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>alternativeTo</span></a> <span class="h-card" translate="no"><a href="https://mastodon.archive.org/@internetarchive" class="u-url mention" rel="nofollow noopener" target="_blank">@<span>internetarchive</span></a></span>! </p><p>Runs on <a href="https://chaos.social/tags/Python" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Python</span></a> (OS-packaged or <a href="https://chaos.social/tags/docker" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>docker</span></a>‬ed) and saves both single pages or whole website crawls in every format you could wish for:</p><p>✅ self-contained single-page HTML<br>✅ PDF<br>✅ PNG screenshot<br>✅ plaintext<br>✅ DOM-dump<br>✅ priv./publ. <a href="https://chaos.social/tags/archive" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>archive</span></a><br>✅ media audio/video included (+yt-dlp)<br>✅ <a href="https://chaos.social/tags/WARC" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>WARC</span></a> compat.</p><p>🌐 <a href="https://archivebox.io" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="">archivebox.io</span><span class="invisible"></span></a><br>📜 <a href="https://github.com/ArchiveBox/ArchiveBox" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">github.com/ArchiveBox/ArchiveB</span><span class="invisible">ox</span></a><br>▶ <a href="https://demo.archivebox.io" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="">demo.archivebox.io</span><span class="invisible"></span></a></p><p><a href="https://chaos.social/tags/WebArchiving" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>WebArchiving</span></a> <a href="https://chaos.social/tags/WebCrawling" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>WebCrawling</span></a> <a href="https://chaos.social/tags/DigitalPreservation" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DigitalPreservation</span></a></p>
B2C<p><a href="https://framapiaf.org/tags/PatNum_CH" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>PatNum_CH</span></a> </p><p>L'archivage du web est un jeu d'enfant avec <a href="https://webrecorder.net/" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="">webrecorder.net/</span><span class="invisible"></span></a></p><p>Format compatible <a href="https://framapiaf.org/tags/WARC" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>WARC</span></a> </p><p><a href="https://framapiaf.org/tags/ArchivCH" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ArchivCH</span></a></p>
Digital History Berlin<p>Diese Woche widmen wir uns im <a href="https://fedihum.org/tags/DigitalHistoryOFK" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DigitalHistoryOFK</span></a> gemeinsam mit Annabel Walz (Friedrich-Ebert-Stiftung) dem komplexen Thema der Webarchivierung. Aus gedächtnisinstitutioneller Perspektive wird sie die Eigenschaften von <a href="https://fedihum.org/tags/borndigital" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>borndigital</span></a> &amp; <a href="https://fedihum.org/tags/reborndigital" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>reborndigital</span></a> Quellen, aber auch Best Practices für ihre Archivierung diskutieren, die auf <a href="https://fedihum.org/tags/WebCrawling" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>WebCrawling</span></a> als Praktik &amp; <a href="https://fedihum.org/tags/WARC" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>WARC</span></a> als Speicherformat setzen. </p><p>🔜 Mi, 29. Nov., 4-6 pm - via Zoom</p><p>ℹ️ Info: <a href="https://dhistory.hypotheses.org/6411" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="">dhistory.hypotheses.org/6411</span><span class="invisible"></span></a></p><p>___<br><a href="https://fedihum.org/tags/DigitalHistory" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DigitalHistory</span></a> <a href="https://fedihum.org/tags/WebArchive" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>WebArchive</span></a> <span class="h-card" translate="no"><a href="https://a.gup.pe/u/histodons" class="u-url mention" rel="nofollow noopener" target="_blank">@<span>histodons</span></a></span></p>
DigitalPebble Ltd<p>Call to all <a href="https://fosstodon.org/tags/StormCrawler" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>StormCrawler</span></a> users: we will release a new version shortly so that people can benefit from the latest additions (<a href="https://fosstodon.org/tags/Opensearch" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Opensearch</span></a>) and improvements (<a href="https://fosstodon.org/tags/WARC" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>WARC</span></a>). Any chance you could test some crawls with the latest code in the main branch and report any issues? Thanks</p>
DigitalPebble Ltd<p>A very nice contribution to <a href="https://fosstodon.org/tags/StormCrawler" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>StormCrawler</span></a> improving the generation of <a href="https://fosstodon.org/tags/WARC" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>WARC</span></a> files</p><p><a href="https://github.com/DigitalPebble/storm-crawler/pull/1010" rel="nofollow noopener" target="_blank"><span class="invisible">https://</span><span class="ellipsis">github.com/DigitalPebble/storm</span><span class="invisible">-crawler/pull/1010</span></a></p><p><a href="https://fosstodon.org/tags/webarchiving" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>webarchiving</span></a></p>