RSE Day 2026

Name: RSE Day 2026
Start: 2026-06-25T10:00:00+02:00
End: 2026-06-25T17:00:00+02:00
Location: Science City Bahrenfeld

June 25, 2026

Science City Bahrenfeld

Europe/Berlin timezone

Katrin Schöning-Stierand

Enhancing OCR using Large Language Models

Not scheduled

1h 30m

AER Atrium (Science City Bahrenfeld)

AER Atrium

Science City Bahrenfeld

Albert-Einstein-Ring 8-10 22761 Hamburg

Poster and Lightning Talk Posterwalk and Lightning Talks

Thomas Asselborn (Universität zu Lübeck)

Historical documents remain difficult to digitise accurately, as OCR systems struggle with niche fonts, paper degradation, physical damage, and handwritten annotations. Consequently, OCR results often contain errors that impair the usability of archives. We examine two machine learning-based approaches to OCR post-correction. The first uses the LLM Llama 3 to identify, correct, and reconstruct erroneous or missing text. The second treats OCR output as a “language” and frames the post-processing as a machine translation task. Marian, a pre-trained sequence-to-sequence model, translates erroneous OCR text into its corrected form, thereby learning document-specific error patterns. Both approaches are compared in terms of accuracy and text reconstruction: LLMs offer flexibility and strong gap-filling capabilities; fine-tuned translation models provide faster and more hardware-efficient solutions.

Thomas Asselborn (Universität zu Lübeck) Dr Magnus Bender (Aarhus University) Prof. Ralf Möller (Universität Hamburg) Dr Sylvia Melzer (Universität zu Lübeck und Universität Hamburg)

Jens Dörpinghaus (University of Koblenz, Federal Institute for Vocational Education and Training (BIBB))

There are no materials yet.

RSE Day 2026

Katrin Schöning-Stierand

Enhancing OCR using Large Language Models

AER Atrium

Science City Bahrenfeld

Speaker

Description

Authors

Co-author

Presentation materials

Choose timezone

RSE Day 2026

Katrin Schöning-Stierand

Speaker

Description

Authors

Co-author

Presentation materials