toad.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
Mastodon server operated by David Troy, a tech pioneer and investigative journalist addressing threats to democracy. Thoughtful participation and discussion welcome.

Administered by:

Server stats:

380
active users

#ner

0 posts0 participants0 posts today

🔠 Panel: More than Chatbots: Multimodal Large Language Models in Humanities Workflows

At #DHd2025, Nina Rastinger explores how well #AI handles abbreviations & NER:

✅ NER works well, even with small, low-cost models
❌ Abbreviations are tricky—costs & resource demands skyrocket
🚀 GPT o1 improves performance, even on abbreviations, but remains resource-intensive
Balancing accuracy & efficiency in text processing remains a challenge! ⚖️

🥁 We are happy to announce that we just published our first preprint on arXiv: "NER4all or Context is All You Need: Using LLMs for low-effort, high-performance NER on historical texts. A humanities informed approach".🎉

👉 arxiv.org/abs/2502.04351 👈

It is also our first endevour into collaborative work with such a large number of collaborators & contributors from the Chair of Digital History, NFDI4Memory's Methods Innovation Lab, & AI-Skills.

arXiv.orgNER4all or Context is All You Need: Using LLMs for low-effort, high-performance NER on historical texts. A humanities informed approachNamed entity recognition (NER) is a core task for historical research in automatically establishing all references to people, places, events and the like. Yet, do to the high linguistic and genre diversity of sources, only limited canonisation of spellings, the level of required historical domain knowledge, and the scarcity of annotated training data, established approaches to natural language processing (NLP) have been both extremely expensive and yielded only unsatisfactory results in terms of recall and precision. Our paper introduces a new approach. We demonstrate how readily-available, state-of-the-art LLMs significantly outperform two leading NLP frameworks, spaCy and flair, for NER in historical documents by seven to twentytwo percent higher F1-Scores. Our ablation study shows how providing historical context to the task and a bit of persona modelling that turns focus away from a purely linguistic approach are core to a successful prompting strategy. We also demonstrate that, contrary to our expectations, providing increasing numbers of examples in few-shot approaches does not improve recall or precision below a threshold of 16-shot. In consequence, our approach democratises access to NER for all historians by removing the barrier of scripting languages and computational skills required for established NLP tools and instead leveraging natural language prompts and consumer-grade tools and frontends.

ReadMe2KG: Github ReadMe to Knowledge Graph #Challenge has been published as part of the Natural Scientific Language Processing and Research Knowledge Graphs #NSLP2025 workshop co-located with #eswc2025. This #NER task aims to complement the NDFI4DataScience KG via information extraction from GitHub README files.

task description: nfdi4ds.github.io/nslp2025/doc
website: codabench.org/competitions/539

@eswc_conf @GenAsefa @shufan @NFDI4DS #NFDIrocks #knowledgegraphs #semanticweb #nlp #informationextraction

Wir haben im Rahmen eines Projekts den Nachlass Joseph von #Laßberg digitalisiert, mit #eScriptorium Volltexte erzeugt und noch #NER mit spaCy (als Forschungsdaten) und Googles NL gemacht. Spannendes Projekt, oft festgestellt, dass Open-Source-Alternativen noch nicht so weit sind und viele Übersetzungsschritte brauchen. Trotzdem erfolgreich fertiggestellt. Steht jetzt öffentlich zur Verfügung.

digital.blb-karlsruhe.de/lassb

digital.blb-karlsruhe.deJoseph von Laßberg / Laßberg, Joseph von [1770-1855] [1-20]Joseph von Laßberg

Named Entry Recongition ist eine computergestützte Methode zur Erkennung und Klassifizierung von Eigennamen in Texten. Bei historischen Texten ergeben sich besondere Herausforderungen für NER, z.B. durch nicht-standardisierte Schreibweisen.

Selina Galka hat versucht, eigene #NER Modelle für die Memoiren der Gräfin von Schwerin zu trainieren. Die Ergebnisse sind gemischt:

memoiren.hypotheses.org/609

#NER, aber prompto! 🤖

Im morgigen #DigitalHistoryOFK demonstrieren Torsten Hiltmann, Martin Dröge & Nicole Dresselhaus (HU Berlin, #4Memory) am Bsp. des Baedeker-Reiseführers von 1921 die Potenziale von #LargeLanguageModels & prompt-basierten Ansätzen für die #NamedEntityRecognition in historischen Textquellen.

Offen für alle!

🔜 Wann? Mi., 26.06., 4-6 pm, Zoom
ℹ️ Abstract: dhistory.hypotheses.org/7870
____
#DigitalHistory #promptoNER #LLM #genAI @nfdi4memory @histodons

Nächste Woche startet wieder das #DigitalHistoryOFK 🎉

Wir freuen uns, auch für das SoSe 24 wieder ein vielfältiges Programm präsentieren zu dürfen.
Mit dabei sind Vorträge zu #NFDI4Memory, #DataFeminism, #NER mit #LLMs, #MedievalHistory, #DataLiteracy, #MediaHistory & vielem mehr!

👉 Zum Programm: dhistory.hypotheses.org/digita

Das Kolloquium findet via Zoom statt & ist offen für alle, die sich für #DigitalHistory & #DigitalHumanities interessieren.

___
@histodons #digiGW

Replied in thread

Next: Harri Kiiskinen, Asko Nivala, Jasmine Westerlund, and Juhana Saarelainen (2023). “Extracting Geographical References from Finnish Literature. Fully Automated Processing of Plain-Text Corpora”. In: Journal of Computational Literary Studies 2 (1). doi: doi.org/10.48694/jcls.3584.

Keywords: named entity recognition, geographic information system, #geoparsing linked open data, literary geography, #Finland

#JCLS#CLS#LOD

For science nerds, there’s what looks like a really good series of presentations from The World Science Festival in 3 parts coming up today at 4pm on YouTube.

I have really enjoyed previous WSF videos and this one sounds really exciting.

youtu.be/P8pC_yehPH0?si=DsziFv

Beyond Einstein:
Gravitational Rainbows
Gravitational Echoes
Gravitational Geysers

Hello 👋

we are four Belgian Federal Scientific Institutes that want to share FAIR data about Entities related to Belgian Cultural Heritage by the end of 2026.

Here you will find multilingual updates on our project!

More Info
➡️ kbr.be/en/projects/metabelgica
➡️ github.com/metabelgica
➡️ zenodo.org/communities/metabel

📣 Please spread the word and follow us!

KBRMetaBelgica • KBRA shared entity management infrastructure between Federal Scientific Institutes in Belgium About the project Federal Scientific […]

How could you parse ingredients in a recipe? We could burn a bunch of energy using an LLM or, as I show in this blog post, we could use #spacy to build a robust and efficient little parser that could be replaced with an #ner model later. More fun with #gastronaut - my WIP, #django based #activitypub recipe app brainsteam.co.uk/2023/11/19/pa

brainsteam.co.ukParsing Ingredient Strings with SpaCy PhraseMatcher – Brainsteam

Auf den Spuren der #LostAuthors:

Im heute veröffentlichten Paper zur #DigHis23 demonstriert @monicab wie digitale Methoden wie #TextReuse und #NER eingesetzt werden können, um zu erforschen, wie antike Autoren Historiker erwähnten & zitierten. Ein besonderer Fokus lag dabei auf der Identifikation solcher Historiker, deren Werke nur noch fragmentarisch erhalten sind.

📖 Zum Paper: doi.org/10.5281/zenodo.8322062

ZenodoAncient Greek Historians in the Digital AgeEin Beitrag zur Digital History 2023: Digitale Methoden in der geschichtswissenschaftlichen Praxis: Fachliche Transformationen und ihre epistemologischen Konsequenzen, Berlin, 23.-26.5.2023. Abstract: This paper presents results of ongoing digital projects on ancient Greek historians. The research question is the analysis of the language used by ancient sources to refer to historians and cite their works with a particular reference to lost historians (the so-called fragmentary authors). If a lot of scholarship has been devoted to collect fragments of many different genres and try to reconstruct the texts from which they were taken, less effort has been spent on collecting data pertaining to the language used by ancient authors to refer to them and their works. The paper discusses the use of Computational Linguistics techniques and Named Entity Recognition to extract and annotate information about ancient Greek historians and their works from the sources where they are preserved. Morevoer, the paper describes a new catalog of ancient Greek authors and works based on the extraction and annotation of references to them in ancient sources.  

Accompanying my colleagues' presentation "Entity linking historical document OCR by combining Wikidata and Wikipedia" on #SWIB23 today, we published our three Named Entity Disambiguation & Linking models on Hugging Face (de/en/fr) at huggingface.co/SBB as well as three accompanying training databases on Zenodo:

doi.org/10.5281/zenodo.7767403 (de)

doi.org/10.5281/zenodo.7773986 (en)

doi.org/10.5281/zenodo.7773745 (fr)

@cneud @stabi_berlin

huggingface.coSBB (Staatsbibliothek zu Berlin - Preußischer Kulturbesitz)Digital Libraries, Digitization, Cultural Heritage