toad.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
Mastodon server operated by David Troy, a tech pioneer and investigative journalist addressing threats to democracy. Thoughtful participation and discussion welcome.

Administered by:

Server stats:

274
active users

#ner

0 posts0 participants0 posts today

[LangExtract](developers.googleblog.com/en/i) has got me curious, but I don't get what makes it different from a [spacy-llm/prodigy](prodi.gy/docs/large-language-m) setup. Is it just that I am spared the effort of chunking long input and/or constructing output JSON from entities and offsets by writing the corresponding python code myself?...

Ah, one more difference is that langextract is #OpenSource whereas prodigy is not (?). (On the other hand, prodigy has a better integration with a correction+training workflow.)

developers.googleblog.comIntroducing LangExtract: A Gemini powered information extraction library- Google Developers BlogExplore LangExtract: a Gemini-powered, open-source Python library for reliable, structured information extraction from unstructured text with precise source grounding.

We've been working on a little library that might be useful if you work with #TEI and NER or text analysis:

• Extract plaintext from TEI
• Run your NER/NLP tools
• Map results back into the original TEI—without breaking anything!

Perfect for adding automated annotations to existing markup.

👉 github.com/recogito/tei-stando

GitHubGitHub - recogito/tei-standoffconverter-js: Converts between XML tree and a flat plaintext and standoff (position-based table) representationConverts between XML tree and a flat plaintext and standoff (position-based table) representation - recogito/tei-standoffconverter-js

🔠 Panel: More than Chatbots: Multimodal Large Language Models in Humanities Workflows

At #DHd2025, Nina Rastinger explores how well #AI handles abbreviations & NER:

✅ NER works well, even with small, low-cost models
❌ Abbreviations are tricky—costs & resource demands skyrocket
🚀 GPT o1 improves performance, even on abbreviations, but remains resource-intensive
Balancing accuracy & efficiency in text processing remains a challenge! ⚖️

🥁 We are happy to announce that we just published our first preprint on arXiv: "NER4all or Context is All You Need: Using LLMs for low-effort, high-performance NER on historical texts. A humanities informed approach".🎉

👉 arxiv.org/abs/2502.04351 👈

It is also our first endevour into collaborative work with such a large number of collaborators & contributors from the Chair of Digital History, NFDI4Memory's Methods Innovation Lab, & AI-Skills.

arXiv.orgNER4all or Context is All You Need: Using LLMs for low-effort, high-performance NER on historical texts. A humanities informed approachNamed entity recognition (NER) is a core task for historical research in automatically establishing all references to people, places, events and the like. Yet, do to the high linguistic and genre diversity of sources, only limited canonisation of spellings, the level of required historical domain knowledge, and the scarcity of annotated training data, established approaches to natural language processing (NLP) have been both extremely expensive and yielded only unsatisfactory results in terms of recall and precision. Our paper introduces a new approach. We demonstrate how readily-available, state-of-the-art LLMs significantly outperform two leading NLP frameworks, spaCy and flair, for NER in historical documents by seven to twentytwo percent higher F1-Scores. Our ablation study shows how providing historical context to the task and a bit of persona modelling that turns focus away from a purely linguistic approach are core to a successful prompting strategy. We also demonstrate that, contrary to our expectations, providing increasing numbers of examples in few-shot approaches does not improve recall or precision below a threshold of 16-shot. In consequence, our approach democratises access to NER for all historians by removing the barrier of scripting languages and computational skills required for established NLP tools and instead leveraging natural language prompts and consumer-grade tools and frontends.

ReadMe2KG: Github ReadMe to Knowledge Graph #Challenge has been published as part of the Natural Scientific Language Processing and Research Knowledge Graphs #NSLP2025 workshop co-located with #eswc2025. This #NER task aims to complement the NDFI4DataScience KG via information extraction from GitHub README files.

task description: nfdi4ds.github.io/nslp2025/doc
website: codabench.org/competitions/539

@eswc_conf @GenAsefa @shufan @NFDI4DS #NFDIrocks #knowledgegraphs #semanticweb #nlp #informationextraction

Wir haben im Rahmen eines Projekts den Nachlass Joseph von #Laßberg digitalisiert, mit #eScriptorium Volltexte erzeugt und noch #NER mit spaCy (als Forschungsdaten) und Googles NL gemacht. Spannendes Projekt, oft festgestellt, dass Open-Source-Alternativen noch nicht so weit sind und viele Übersetzungsschritte brauchen. Trotzdem erfolgreich fertiggestellt. Steht jetzt öffentlich zur Verfügung.

digital.blb-karlsruhe.de/lassb

digital.blb-karlsruhe.deJoseph von Laßberg / Laßberg, Joseph von [1770-1855] [1-20]Joseph von Laßberg

#NER, aber prompto! 🤖

Im morgigen #DigitalHistoryOFK demonstrieren Torsten Hiltmann, Martin Dröge & Nicole Dresselhaus (HU Berlin, #4Memory) am Bsp. des Baedeker-Reiseführers von 1921 die Potenziale von #LargeLanguageModels & prompt-basierten Ansätzen für die #NamedEntityRecognition in historischen Textquellen.

Offen für alle!

🔜 Wann? Mi., 26.06., 4-6 pm, Zoom
ℹ️ Abstract: dhistory.hypotheses.org/7870
____
#DigitalHistory #promptoNER #LLM #genAI @nfdi4memory @histodons

Nächste Woche startet wieder das #DigitalHistoryOFK 🎉

Wir freuen uns, auch für das SoSe 24 wieder ein vielfältiges Programm präsentieren zu dürfen.
Mit dabei sind Vorträge zu #NFDI4Memory, #DataFeminism, #NER mit #LLMs, #MedievalHistory, #DataLiteracy, #MediaHistory & vielem mehr!

👉 Zum Programm: dhistory.hypotheses.org/digita

Das Kolloquium findet via Zoom statt & ist offen für alle, die sich für #DigitalHistory & #DigitalHumanities interessieren.

___
@histodons #digiGW