Dave Troy @davetroy

0 posts0 participants0 posts today

**@frueheneuzeit** @stefan_hessbrueggen@fedihum.org · Apr 3

Apr 3

@frueheneuzeit @stefan_hessbrueggen@fedihum.org

#til the German transformer model for #spacy is not trained for #ner. Room for improvement, I'd say.

**Oliver Ammann** @oa@swiss.social · Mar 19

Mar 19

Oliver Ammann @oa@swiss.social

#erara hat zum 15jährigen Jubiläum ein paar neue Features bekommen: #NamedEntityRecognition #NamedEntityLinking und verbesserte Volltexterkennung.

https://library.ethz.ch/news-und-kurse/news/news-beitraege/2025/03/15-jahre-e-rara-neue-suchmoeglichkeiten-und-erweiterte-volltexterkennung.html

ETH-Bibliothek15 Jahre e-rara: Neue Suchmöglichkeiten und erweiterte VolltexterkennungNeue Einstiege für Orte, Personen und Themen sowie erweiterte Volltexterkennung erleichtern den Zugang zu digitalisierten Drucken.

#erara15 #ethbibliothek #ethz

**CLSinfra** @CLSinfra@fedihum.org · Mar 19

Mar 19

CLSinfra @CLSinfra@fedihum.org

three CLS INFRA Deliverables on #NLP are out now!
In this video Tess Dejaghere and Pranaydeep Singh of Ghent University CDH explain and demo work on #NER (#NamedEntityRecognition), #ABSA (Aspect-based #SentimentAnalysis) and #RelationalExtraction.
https://youtu.be/RJE83eb7a6A

YouTubeCLS INFRA Deliverables 8.3 - 8.5: ExplainerBy CLS INFRA

**Mareike König** @Mareike2405@fedihum.org · Mar 7 *

Mar 7 *

Mareike König @Mareike2405@fedihum.org

Canada ist ein Pferd - @bibwiss hat die besten Beispiele für Unsicherheit /Uncertainty bei Named Entity Recognition :) #NER #DHd2025

Sophie Schneider vorn im Hörsaal beim Vortrag und Blick auf Präsentation

**Holle Meding** @hmeding@mastodon.social · Mar 6

Mar 6

Holle Meding @hmeding@mastodon.social

Panel: More than Chatbots: Multimodal Large Language Models in Humanities Workflows

At #DHd2025, Nina Rastinger explores how well #AI handles abbreviations & NER:

NER works well, even with small, low-cost models
Abbreviations are tricky—costs & resource demands skyrocket
GPT o1 improves performance, even on abbreviations, but remains resource-intensive
Balancing accuracy & efficiency in text processing remains a challenge!

Nina Rastinger at Panel More than Chatbots: Multimodal Large Language Models in Humanities Workflows #dhd2025

#NER #TextProcessing #DigitalHumanities

**Digital History Berlin** @DigitalHistory@fedihum.org · Feb 12

Feb 12

Digital History Berlin @DigitalHistory@fedihum.org

We are happy to announce that we just published our first preprint on arXiv: "NER4all or Context is All You Need: Using LLMs for low-effort, high-performance NER on historical texts. A humanities informed approach".

http://arxiv.org/abs/2502.04351

It is also our first endevour into collaborative work with such a large number of collaborators & contributors from the Chair of Digital History, NFDI4Memory's Methods Innovation Lab, & AI-Skills.

arXiv.orgNER4all or Context is All You Need: Using LLMs for low-effort, high-performance NER on historical texts. A humanities informed approachNamed entity recognition (NER) is a core task for historical research in automatically establishing all references to people, places, events and the like. Yet, do to the high linguistic and genre diversity of sources, only limited canonisation of spellings, the level of required historical domain knowledge, and the scarcity of annotated training data, established approaches to natural language processing (NLP) have been both extremely expensive and yielded only unsatisfactory results in terms of recall and precision. Our paper introduces a new approach. We demonstrate how readily-available, state-of-the-art LLMs significantly outperform two leading NLP frameworks, spaCy and flair, for NER in historical documents by seven to twentytwo percent higher F1-Scores. Our ablation study shows how providing historical context to the task and a bit of persona modelling that turns focus away from a purely linguistic approach are core to a successful prompting strategy. We also demonstrate that, contrary to our expectations, providing increasing numbers of examples in few-shot approaches does not improve recall or precision below a threshold of 16-shot. In consequence, our approach democratises access to NER for all historians by removing the barrier of scripting languages and computational skills required for established NLP tools and instead leveraging natural language prompts and consumer-grade tools and frontends.

#DigitalHistory #NER #LLM

**Harald Sack** @lysander07@sigmoid.social · Jan 29 *

Jan 29 *

Harald Sack @lysander07@sigmoid.social

ReadMe2KG: Github ReadMe to Knowledge Graph #Challenge has been published as part of the Natural Scientific Language Processing and Research Knowledge Graphs #NSLP2025 workshop co-located with #eswc2025. This #NER task aims to complement the NDFI4DataScience KG via information extraction from GitHub README files.

task description: https://nfdi4ds.github.io/nslp2025/docs/readme2kg_shared_task.html
website: https://www.codabench.org/competitions/5396/

@eswc_conf @GenAsefa @shufan @NFDI4DS #NFDIrocks #knowledgegraphs #semanticweb #nlp #informationextraction

Readme2KG Challenge website screen shot:
The vision of NFDI4DataScience (NFDI4DS) is to support all steps of the complex and interdisciplinary research data lifecycle, including collecting/creating, processing, analyzing, publishing, archiving, and reusing resources in Data Science and Artificial Intelligence. GitHub is a popular platform for hosting and collaborating on software projects. In the context of research, authors can use GitHub repositories to share the datasets, models, and source code of experiments in the paper. These repositories can provide implementation details and facilitate the exploration and reproduction of research results. Each GitHub repository typically includes a README.md file, which serves as an introductory document for the project. READMEs are usually written in Markdown format and provide key information such as the project’s purpose, setup instructions, usage examples, and often links to the original research paper. Aiming to enhance the NDFI4DS-KG[1] with information from GitHub README files, a fine-grained Named Entity Recognition task is proposed.

**Gerrit Heim** @Gerrit_Heim@openbiblio.social · Oct 18, 2024

Oct 18, 2024

Gerrit Heim @Gerrit_Heim@openbiblio.social

Wir haben im Rahmen eines Projekts den Nachlass Joseph von #Laßberg digitalisiert, mit #eScriptorium Volltexte erzeugt und noch #NER mit spaCy (als Forschungsdaten) und Googles NL gemacht. Spannendes Projekt, oft festgestellt, dass Open-Source-Alternativen noch nicht so weit sind und viele Übersetzungsschritte brauchen. Trotzdem erfolgreich fertiggestellt. Steht jetzt öffentlich zur Verfügung.

https://digital.blb-karlsruhe.de/lassberg/topic/view/316114

digital.blb-karlsruhe.deJoseph von Laßberg / Laßberg, Joseph von [1770-1855] [1-20]Joseph von Laßberg

Continued thread

**e-editiones** @eeditiones@social.e-editiones.org · Sep 3, 2024

Sep 3, 2024

e-editiones @eeditiones@social.e-editiones.org

now #NER via #spaCy and how the Annotations user interface is utilised to review potential matches and to do bulk updates for entities. She continues with connectors to authority provider like #Airtable, #Wikidata and others.

picture show the TEI Publisher Annotations Editor

**de.hypotheses** @dehypotheses@fedihum.org · Jul 1, 2024

Jul 1, 2024

de.hypotheses @dehypotheses@fedihum.org

Named Entry Recongition ist eine computergestützte Methode zur Erkennung und Klassifizierung von Eigennamen in Texten. Bei historischen Texten ergeben sich besondere Herausforderungen für NER, z.B. durch nicht-standardisierte Schreibweisen.

Selina Galka hat versucht, eigene #NER Modelle für die Memoiren der Gräfin von Schwerin zu trainieren. Die Ergebnisse sind gemischt:

https://memoiren.hypotheses.org/609

#Memoiren #NLP #DigitalHumanities

**Digital History Berlin** @DigitalHistory@fedihum.org · Jun 25, 2024

Jun 25, 2024

Digital History Berlin @DigitalHistory@fedihum.org

#NER, aber prompto!

Im morgigen #DigitalHistoryOFK demonstrieren Torsten Hiltmann, Martin Dröge & Nicole Dresselhaus (HU Berlin, #4Memory) am Bsp. des Baedeker-Reiseführers von 1921 die Potenziale von #LargeLanguageModels & prompt-basierten Ansätzen für die #NamedEntityRecognition in historischen Textquellen.

Offen für alle!

Wann? Mi., 26.06., 4-6 pm, Zoom
Abstract: https://dhistory.hypotheses.org/7870
____
#DigitalHistory #promptoNER #LLM #genAI @nfdi4memory @histodons

**Digital History Berlin** @DigitalHistory@fedihum.org · Apr 29, 2024

Apr 29, 2024

Digital History Berlin @DigitalHistory@fedihum.org

Nächste Woche startet wieder das #DigitalHistoryOFK

Wir freuen uns, auch für das SoSe 24 wieder ein vielfältiges Programm präsentieren zu dürfen.
Mit dabei sind Vorträge zu #NFDI4Memory, #DataFeminism, #NER mit #LLMs, #MedievalHistory, #DataLiteracy, #MediaHistory & vielem mehr!

Zum Programm: https://dhistory.hypotheses.org/digital-history-forschungskolloquium/programm-sommersemester-2024

Das Kolloquium findet via Zoom statt & ist offen für alle, die sich für #DigitalHistory & #DigitalHumanities interessieren.

___
@histodons #digiGW

Replied in thread

**JCLS** @jcls@fedihum.org · Apr 4, 2024

Apr 4, 2024

JCLS @jcls@fedihum.org

Next: Harri Kiiskinen, Asko Nivala, Jasmine Westerlund, and Juhana Saarelainen (2023). “Extracting Geographical References from Finnish Literature. Fully Automated Processing of Plain-Text Corpora”. In: Journal of Computational Literary Studies 2 (1). doi: https://doi.org/10.48694/jcls.3584.

Keywords: named entity recognition, geographic information system, #geoparsing linked open data, literary geography, #Finland

Figure: Table showing the sum of tokens in the corpus per genre and decade. 1870s to 1940s for the decades. fiction, drama, poetry and misc for the genres. Several million tokens per decade, for a total of over 20 million tokens.

#JCLS #CLS #LOD

**Jo-stands on guard, elbows up.** @JoBlakely@mastodon.social · Mar 8, 2024 *

Mar 8, 2024 *

Jo-stands on guard, elbows up. @JoBlakely@mastodon.social

For science nerds, there’s what looks like a really good series of presentations from The World Science Festival in 3 parts coming up today at 4pm on YouTube.

I have really enjoyed previous WSF videos and this one sounds really exciting.

https://youtu.be/P8pC_yehPH0?si=DsziFvjRlf9Zdmkq

Beyond Einstein:
Gravitational Rainbows
Gravitational Echoes
Gravitational Geysers

YouTubeBeyond Einstein: Part One - Gravitational RainbowsBy World Science Festival

#physics #astrophysics #science

**MetaBelgica** @metabelgica@fedihum.org · Nov 20, 2023

Nov 20, 2023

MetaBelgica @metabelgica@fedihum.org

Hello

we are four Belgian Federal Scientific Institutes that want to share FAIR data about Entities related to Belgian Cultural Heritage by the end of 2026.

Here you will find multilingual updates on our project!

More Info
https://www.kbr.be/en/projects/metabelgica/
https://github.com/metabelgica
https://zenodo.org/communities/metabelgica

Please spread the word and follow us!

KBRMetaBelgica • KBRA shared entity management infrastructure between Federal Scientific Institutes in Belgium About the project Federal Scientific […]

#MetaBelgica #ResearchInfrastructure #Belgium

**Dr James Ravenscroft** @jamesravey@fosstodon.org · Nov 19, 2023 *

Nov 19, 2023 *

Dr James Ravenscroft @jamesravey@fosstodon.org

How could you parse ingredients in a recipe? We could burn a bunch of energy using an LLM or, as I show in this blog post, we could use #spacy to build a robust and efficient little parser that could be replaced with an #ner model later. More fun with #gastronaut - my WIP, #django based #activitypub recipe app https://brainsteam.co.uk/2023/11/19/parsing-ingredient-strings-with-spacy-phrasematcher/

brainsteam.co.ukParsing Ingredient Strings with SpaCy PhraseMatcher – Brainsteam

**de.hypotheses** @dehypotheses@fedihum.org · Oct 23, 2023

Oct 23, 2023

de.hypotheses @dehypotheses@fedihum.org

Ein Quellentext ist fertig transkribiert, jetzt beginnt normalerweise die mühsame Arbeit der Editoren zur Textaufbereitung: Lassen sich Personen und Orte nicht einfach automatisch erfassen?

https://grandtourdig.hypotheses.org/949

Maximilian Görmar im Blog Grand Tour digital zu Versuchen mit Named Entity Recognition am Reisebericht eines Apothekers.

Grand Tour digitalErste Versuche mit der Named Entity Recognition am Reisebericht des Apothekers WagenerNachdem in unserem Blog schon mehrere Einblicke in den Transkriptionsprozess mit Transkribus gewährt wurden, stellt sich nun die Frage: Wie verfahre ich weiter, wenn ich meine Quelle händisch, mittels Transkribus oder einem anderen Handschriftenerkennungstool transkribiert und nun einen Volltext vorliegen habe? Ein Weg, und wohl in vielen Kontexten noch lange der...

#DigitalHumanities #NamedEntityRecognition #NER

**Digital History Berlin** @DigitalHistory@fedihum.org · Sep 20, 2023

Sep 20, 2023

Digital History Berlin @DigitalHistory@fedihum.org

Auf den Spuren der #LostAuthors:

Im heute veröffentlichten Paper zur #DigHis23 demonstriert @monicab wie digitale Methoden wie #TextReuse und #NER eingesetzt werden können, um zu erforschen, wie antike Autoren Historiker erwähnten & zitierten. Ein besonderer Fokus lag dabei auf der Identifikation solcher Historiker, deren Werke nur noch fragmentarisch erhalten sind.

Zum Paper: https://doi.org/10.5281/zenodo.8322062

ZenodoAncient Greek Historians in the Digital AgeEin Beitrag zur Digital History 2023: Digitale Methoden in der geschichtswissenschaftlichen Praxis: Fachliche Transformationen und ihre epistemologischen Konsequenzen, Berlin, 23.-26.5.2023. Abstract: This paper presents results of ongoing digital projects on ancient Greek historians. The research question is the analysis of the language used by ancient sources to refer to historians and cite their works with a particular reference to lost historians (the so-called fragmentary authors). If a lot of scholarship has been devoted to collect fragments of many different genres and try to reconstruct the texts from which they were taken, less effort has been spent on collecting data pertaining to the language used by ancient authors to refer to them and their works. The paper discusses the use of Computational Linguistics techniques and Named Entity Recognition to extract and annotate information about ancient Greek historians and their works from the sources where they are preserved. Morevoer, the paper describes a new catalog of ancient Greek authors and works based on the extraction and annotation of references to them in ancient sources.

#DigitalHistory #digiGW #DigitalHumanities

**Jörg Lehmann** @jrglmn@mastodon.social · Sep 13, 2023

Sep 13, 2023

Jörg Lehmann @jrglmn@mastodon.social

Accompanying my colleagues' presentation "Entity linking historical document OCR by combining Wikidata and Wikipedia" on #SWIB23 today, we published our three Named Entity Disambiguation & Linking models on Hugging Face (de/en/fr) at https://huggingface.co/SBB as well as three accompanying training databases on Zenodo:

https://doi.org/10.5281/zenodo.7767403 (de)

https://doi.org/10.5281/zenodo.7773986 (en)

https://doi.org/10.5281/zenodo.7773745 (fr)

@cneud @stabi_berlin

huggingface.coSBB (Staatsbibliothek zu Berlin - Preußischer Kulturbesitz)Digital Libraries, Digitization, Cultural Heritage

#ner #namedentitylinking #namedentitydisambiguation