toad.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
Mastodon server operated by David Troy, a tech pioneer and investigative journalist addressing threats to democracy. Thoughtful participation and discussion welcome.

Administered by:

Server stats:

211
active users

#FileFormats

0 posts0 participants0 posts today

New blog post about the Library of Congress 2025-2026 updates to the Recommended Formats Statement (RFS). A few changes to note related to Design and 3D formats (mostly just rescoping), updates to email-related metadata to better align with EA-PDF and continued work to document digital accessibility support in file formats listed as "acceptable" in the RFS. See the Change Log for all updates. Comments welcome! #fileformats #digipres blogs.loc.gov/thesignal/2025/0

The Library of CongressRecommended Formats Statement: Updates for 2025-2026 | The SignalThe Library of Congress has published the updated 2025-2026 Recommended Formats Statement. Updates have been made to preferred and accepted formats across multiple content categories, and are captured in a Change Log.

The @w3c recently released version 3 of the #PNG spec more than 2 decades after version 2 was released in 2003. The new spec largely makes official various extensions already in use like animated PNGs, HDR support, and EXIF metadata, so many browsers and graphics apps already largely support it.

Here’s an article by Chris Blume, who is the chair of the W3C PNG Working Group: programmax.net/articles/png-is

1/2

www.programmax.netPNG is back!After 20 years, PNG is back with renewed vigor! A new PNG spec was just released.

How-To Geek: Snipping Tool Is Getting a Big GIF Upgrade. “GIFs remain one of the most popular image formats on the internet, despite their age. They’re even natively integrated into most messaging apps. Now, Microsoft is testing the ability to create and export them using Windows’s native screenshot and screen recording tool.”

https://rbfirehose.com/2025/06/21/how-to-geek-snipping-tool-is-getting-a-big-gif-upgrade/

ResearchBuzz: Firehose | Individual posts from ResearchBuzz · How-To Geek: Snipping Tool Is Getting a Big GIF Upgrade | ResearchBuzz: Firehose
More from ResearchBuzz: Firehose

Library of Congress: Preserving a History of Digital Mapmaking: Inside the Geospatial Software and File Formats Documentation Web Archive. “In this interview, Tim St. Onge and Meagan Snow explain how web archiving is preserving documentation essential to understanding the evolution of modern cartography. They outline the motivations behind the Geospatial Software and File Formats […]

https://rbfirehose.com/2025/06/20/preserving-a-history-of-digital-mapmaking-inside-the-geospatial-software-and-file-formats-documentation-web-archive-library-of-congress/

ResearchBuzz: Firehose | Individual posts from ResearchBuzz · Preserving a History of Digital Mapmaking: Inside the Geospatial Software and File Formats Documentation Web Archive (Library of Congress) | ResearchBuzz: Firehose
More from ResearchBuzz: Firehose

University of Michigan: U-M develops free tool to empower municipalities, modernize financial reporting. “When Congress passed the Financial Data and Transparency Act in 2022, it required most municipalities in the U.S. to modernize and digitize their financial reports. This is a heavy lift for small towns and school districts, most of which still report their financial information in PDF […]

https://rbfirehose.com/2025/05/25/university-of-michigan-u-m-develops-free-tool-to-empower-municipalities-modernize-financial-reporting/

Looking for some more advanced techies to help me out here. I was browsing the files of old abandonware (as one does) and came across the .zym file format in a game called Gubble 2. Does anyone have any idea what this file format is? Is it something proprietary by the gubble devs? Something that just isn't used anymore? The only thing google brought up regarding .zym was some mods for quake.

Our semi-annual blog post is out to recap recent work with #fileformats and #digitalpreservation at the Library of Congress. Highlights incl new format research on #finale music notation formats, participation in #iPres2024 and @anj's Registries of Practice project.

w/ Elizabeth M. Caringola, Genevieve Havemeyer - King, Liz Holdzkom and Marcus Nappier

blogs.loc.gov/thesignal/2024/1

The Library of CongressFile Format Research Roundup | The SignalThis post is the most recent in a series about file format research for the Sustainability of Digital Formats site at the Library of Congress, including several new format descriptions as well as community collaborations.

literally who hurt genomics to make you all encode one specific kind of number as the ASCII characters from ! to ~ as an integer input to some logarithm function, but then others of you changed the function but kept encoding it as a single ASCII character ranging from -5 to 62 (???), and then later they decided that -5 to 62 was silly and so they changed that to 0-62 and throwing away half the original range for no reason, except actually it's 0-40 by convention.

did anyone consider "encoding it as a number"

doi.org/10.1093/nar/gkp1137

simpledroid: completing the circle

It’s nearing the end of 2024 and that must mean a PRONOM hackathon as part of the World Digital Preservation Day (#WDPD2024).

My contribution is a follow-up on my work earlier in the year to produce a valid DROID signature file from Wikidata in wddroidy.

simpledroid is available on GitHub and creates a simple DROID signature file from PRONOM itself, creating a scripted pathway to create a signature file using official PRONOM data that doesn’t require the current PRONOM database and its legacy stored procedures.

It also does away with a lot of the excess data in the current DROID signature file which was previously an optimization for its Boyer Moore Horspool search algorithm, as described by Matthew Palmer.

The primary reason for simpledroid was to complete the circle on my previous efforts and to prove that it was possible to create a simplified signature file and for it to work with DROID. The result is about 80-90% there, with only a few skeleton files that remain unidentified – it should only require a small amount of forensic research to determine the reason.

The output provides a way for simplifying the signature file generation process, offering new opportunities to create alternative versions, or filtering what’s already there, e.g. filtering out any signatures that aren’t explicitly for image identification, e.g. in a digitization workflow.

It may provide another way into PRONOM data for those who might look at DROID first as well as opening up different ways to modify and test signatures.

It is possible to see in the reference output, that the signatures are much easier to understand via this simplified DROID file.

simpledroid outputs a file with a smaller footprint than the current file:

1.2M DROID_SignatureFile_Simple_2024-11-11T12-29-22Z.xml
3.4M DROID_SignatureFile_V118.xml

It also contains all of the file classification data e.g. FormatType="Video" from PRONOM that will be added into DROID in a future release (and is already available in Siegfried).

Unlike the wddroidy work, priorities have also been added to the signature file so the mechanics of the signature file are pretty close to the official version (DROID uses the signature sequence and offsets to identify a file, but it then uses a priority to determine what results to display to the user where there may otherwise be positive matches for formats that provide the foundation for another, e.g. how XML forms the basis of SVG or XHTML.

It might be possible to remove some data around minimum and maximum offsets in the new file after discovering that simplified droid syntax requires curly bracket syntax at the beginning and end of sequences to mimic the same behavior, e.g.

With a BOFoffset, min_offset = 2, and signature = BADF00D1, the signature needs to become {2}BADF00D1 to work.

The code is pretty straightforward and uses a few tricks to output XML sensibly without having to build the document’s tree (DOM) in a more verbose way. There are probably a few other shortcuts I’d fix with time if the code was ever useful, including improving variable naming and adding tests.

I’m not sure this code will ever be needed, or used by anyone, but for a quick hack and a quick proof of concept, it felt good to put it out there. Maybe someone will look at this or the wddroidy work and see there may be a way to federate different sources of signature information together into something DROID can use. Or it might be a useful demonstration to the DROID team that allows them to simplify PRONOM’s database and output mechanisms in a way that remains compatible with existing tools.

Previous research week work

My previous work for PRONOM research week includes a dashboard and API for getting more information out of PRONOM, including listings of those records still requiring descriptions or signatures. You may find that work interesting and it is available at https://pronom.ffdev.info and https://api.pronom.ffdev.info.

And if you want to get in on the signature development work, signature development utility 2.0 (https://ffdev.info) was also a previous effort of mine for research week 2020 and will hopefully also benefit from outputting DROID’s simplified syntax.

A week of file formats

Of course with World Digital Preservation Day, file formats were pretty popular.

Andrew Jackson attempted to calculate how many distinct formats might be out there using methods used to calculate ecological diversity.

Amanda Tome described the scope of their work and shared a number of useful resources including useful links to the PRONOM starter pack and to the PRONOM drop-in sessions.

You might also find out a bit more about yourself by playing this File Format Dating Game from Lotte Wijsman and colleagues: Susanne van den Eijkel, Anton van Es, Elaine Murray, Francesca Mackenzie, Ellie O’Leary, and Sharon McMeekin. (I ended up on a date with FASTA (FDD000622) in my first play-through!)

Not specifically for WDPD, but in the same week I also enjoyed this presentation from Ange Albertini looking at different ways of identifying file formats. One big take away for me was thinking about how to get more forensic information out of a file format identification. DROID doesn’t tell us a lot, but is there a world in which one day it could?

Let me know if you find any of this work useful at all; and good luck on your file format endeavors this week.

"Basically, if a file on your computer can only be opened by a specific piece of software, and that software is controlled by a single company, you should probably export it to an open format. It's the only way to future-proof it."

You don't say 🤷

Some time ago, I was able to restore about 3 years (2004 onwards) worth of personal notes from a backup on CD-ROM, because the notes were stored as plain text files by my personal wiki back then. They now reside in my #pkm tool.

wired.com/story/how-to-properl