RSE Day 2025

Name: RSE Day 2025
Start: 2025-07-16T09:00:00+02:00
End: 2025-07-16T17:00:00+02:00
Location: Science City Bahrenfeld

16 July 2025

Science City Bahrenfeld

Europe/Berlin timezone

Katrin Schöning-Stierand

Contribution List

13. CaloClouds3; Diffusion and normalising flows

Henry Day-Hall (DESY)

16/07/2025, 10:05

Poster + Lightning Talk

Lightning Talks

This poster presents the final iteration of the CaloClouds series. Simulation of photon showers in the granularities expected in a future Higgs factory is computationally challenging. A viable simulation must capture the find details exposed by such a detector, yet be substantially faster than MCMC methods. The Caloclouds model utilises point cloud diffusion and normalising flows to replicate...

23. Private AI for Research: Secure, Scalable and Automated

Sven Heins (Universität Hamburg)

16/07/2025, 10:08

Poster + Lightning Talk

Lightning Talks

We present a privacy-preserving research environment integrating offline Large Language Models (LLMs), AI agents, and scalable infrastructure. By deploying private LLMs via Ollama and containerized workflows on Kubernetes, researchers can automate tasks like literature review, code generation, and secure data processing without compromising sensitive information. AI agents—coordinated through...

27. xbat – An Easy-to-Use and Universally Applicable Benchmarking Automation Tool for HPC Software Within the Project hpc.bw (dtec.bw)

Willi Leinen (Helmut Schmidt University)

16/07/2025, 10:11

Poster + Lightning Talk

Lightning Talks

Benchmarking applications in high-performance computing (HPC) systems is essential for optimising runtime, reducing energy consumption, and ensuring efficient hardware utilisation. However, accessing and interpreting performance metrics can be challenging and error prone. To address this, we present xbat (extended benchmarking automation tool), developed by MEGWARE Computer Vertrieb und...

20. Digital Edition of the Levezow Album: Interactive Visualization of 17th-Century Drawings

Amy Isard (HCDS/UWA)

16/07/2025, 10:14

Poster + Lightning Talk

Lightning Talks

The "Digital Edition Levezow Album" project is an interdisciplinary collaboration between the Hub of Computing and Data Science (HCDS), the Department of Art History at the University of Hamburg, and the State and University Library Hamburg. The project aims to digitally process and interactively visualize a previously unexplored sketchbook from the late 17th century, containing drawings on...

22. RAT: A Computational Toolkit for Scalable Search System Analysis

Mr Sebastian Sünkler (Hamburg University of Applied Sciences)

16/07/2025, 10:17

Poster + Lightning Talk

Lightning Talks

The Result Assessment Tool (RAT) is a Python-based software toolkit that addresses the critical research challenge of accessing and analyzing data from various search systems. It uses several computational methods, including Selenium for robust web scraping, Flask for the web interface, PostgreSQL for data management, and automated classifiers for content analysis. With RAT, researchers can...

26. The Sign Language Dataset Compendium

Marc Schulder (IDGS, Universität Hamburg)

16/07/2025, 10:20

Poster + Lightning Talk

Lightning Talks

Resources for research on sign languages are rare and can often be difficult to locate. Few centralised sources of information exist. The Sign Language Dataset Compendium helps by providing an overview of existing lexical resources and linguistic corpora, as well as summary of popular data collection tasks shared among corpora. To date it covers resources for 82 different sign languages. The...

18. EncouRAGe: Evaluating RAG local, fast and reliable

Jan Strich (Universität Hamburg)

16/07/2025, 10:23

Poster + Lightning Talk

Lightning Talks

We introduce EncouRAGe, a comprehensive Python-based framework designed to streamline the development and evaluation of Retrieval-Augmented Generation (RAG) systems using local Large Language Models (LLMs). Encourage integrates leading tools such as vLLM for efficient inference, Jinja2 for dynamic prompt templating, and MLflow for observability and performance tracking. It supports both...

14. ELECTRODE: An electrochemistry package for atomistic simulations

Ludwig Ahrens-Iwers (TU Hamburg)

16/07/2025, 10:26

Poster + Lightning Talk

Lightning Talks

The ELECTRODE package is a module in the official release of the molecular dynamics code LAMMPS and implements the constant potential method and related methods. Utilizing the massively parallel architecture of LAMMPS with neighbor lists and fast Fourier transforms, the package efficiently calculates interactions between atoms and minimizes their energy as a function of atom...

44. Computational pathology - the gap between clinics and research

Marina Zimmermann (University Medical Center Hamburg-Eppendorf)

16/07/2025, 10:29

Poster + Lightning Talk

Lightning Talks

Computational pathology has made tremendous progress on dedicated datasets in the past years. However, currently such algorithms are still not used routinely for diagnostics in the clinics. There is still a large gap between research and clinics and the factors that contribute to this, such as the focus on reproducing subjective scores and the large variance in performance depending on the...

31. Analyze my data, I don't care how

Vijay Kartik (DESY)

16/07/2025, 14:20

Poster + Lightning Talk

Lightning Talks

We present MENTO, a data processing toolkit that remotely runs external analysis software on-demand using the DESY high-performance computing (HPC) cluster.
MENTO is set up to require no input from users except to point to the desired analysis software, and the entire processing pipeline is then managed automatically, including data input, access to the HPC cluster, job submissions to a...

47. Continuous Integration and Continuous Deployment at DESY

Parthasarathy Tirumalai Nallam Chakravarty (DESY)

16/07/2025, 14:23

Poster + Lightning Talk

Lightning Talks

Continuous Integration and Continuous Deployment is a modern Software Engineering best practice that enables efficient large scale software development and use. There are a variety of popular Ci/CD tools that help in adopting these practices. In this poster we focus on the kinds of software, their runtime environments,packaging and deployment tools and techniques used at DESY that can easily...

36. Ten Years of Experience with the Online Media Monitor for Climate Change

Mr Remon Sadikni

16/07/2025, 14:26

Poster + Lightning Talk

Lightning Talks

For many people, the media are the main source of information about climate change. An increasing number of people have turned to online services from both traditional and new media providers to stay informed. As a result, studying online reporting is essential to understand how public debates about climate change are shaped. To support this, the University of Hamburg developed the Online...

41. GraphRAG based research data retrieval

Timm Lehmberg (Akademie der Wissenschaften in Hamburg)

16/07/2025, 14:29

Poster + Lightning Talk

Lightning Talks

The presentation will introduce a GraphRAG-based approach to research data retrieval from research data catalogues, using the Text+ Registry as an example.
Retrieval-Augmented Generation (RAG) systems have become a cornerstone for LLM-based question-answering tasks involving individual (potentially private or sensitive) unstructured data. However, traditional RAG pipelines often lack an...

42. ISO Schematron: A feather duster to reach the parts other schema languages cannot reach

David Maus (Staats- und Universitätsbibliothek Hamburg Carl von Ossietzky)

16/07/2025, 14:32

Poster + Lightning Talk

Lightning Talks

Schematron is an ISO-standardized validation language for structured data (ISO/IEC 19757:3). It lets you evaluate assertion tests for selected parts of a document. It was first designed as an international standard in 2006 and has been updated continuously. The standardization process of the 4th edition is in its final stages and is expected to finish in September this year.

Schematron's...

50. SmartPhase: Start to End Holotomography

Johannes Hagemann

16/07/2025, 14:35

Poster + Lightning Talk

Lightning Talks

X-ray near-field holography is a full-field phase-sensitive microscopy method. It allows to image specimen with a single exposure in a scalelable field of view. The measurements are so called holograms and require reconstruction to obtain the actual image of the specimen. The reconstruction is the bottleneck of this method. It can be time consuming and algorithm parameters need to be tuned...

28. The Data Hub: Enhancing Collaborative Research and Intelligence through Reproducible Data Harmonization

Jonathan Ströbele (BNITM)

16/07/2025, 14:38

Poster + Lightning Talk

Lightning Talks

The Data Hub is an open source software framework created to address the needs of collaborative research using diverse data across disciplines. It is developed in Python, on top of the Django web-framework and a PostGIS/PostgreSQL database, following computer science best practices as well as the FAIR4RS principles.

The framework’s core function allows reproducible...

49. Memory Efficient Volumetric Deep Neural Network for Digital Volume Correlation

Tak Ming Wong (Helmholtz-Zentrum Hereon)

16/07/2025, 14:41

Poster + Lightning Talk

Lightning Talks

The optical flow method is one of the emerging approaches for Digital Volume Correlation (DVC) to analyze the volumetric deformation during in situ experiments of material science research. However, deep optical flow neural networks for DVC are limited by memory requirement, especially for high volumetric resolution data from Synchrotron Radiation Computed Tomography (SRCT) in the scale of...

11. 3-D weather forecast visualizations generated with open-source research software and based on open data

Christoph Fischer (Universität Hamburg), Marc Rautenhaus (Universität Hamburg)

Poster

Poster Session

Recent developments in open data policies of meteorological agencies have much expanded the set of up-to-date weather observation and forecast data that is publicly available to meteorological research and education. To improve use of this open data, we have developed 3-D visualization products that extract and display meteorological information in novel ways. In this demo, we present...

32. A vector-based transcription system for vowel dots in early Qur’anic manuscripts.

Alba Fedeli (Universität Hamburg), Carolin Kinne-Wall (Universität Hamburg)

Poster

Poster Session

The poster presents findings from the DFG project InterSaME (2020–2023) focused on vowel-dots in early Qur’anic manuscripts. It highlights a vector-based transcription system as there is no current encoding or transcription tool for vowel-dots. Using a customised Archetype software instance, the team developed a pointer-based encoding method that describes each dot's position relative to...

12. Analysing research data in manuscript studies

Hussein Mohammed (Universität Hamburg / The Cluster of Excellence: UWA)

Poster

Poster Session

The multidisciplinary nature of manuscript study at the CSMC results in an ever-increasing volume of digital data in various modalities, ranging from raw images of artefacts to automatically generated data from advanced acquisition techniques. The manual analysis of this data is typically time-consuming and susceptible to human error and bias. Therefore, a set of Pattern Analysis Software...

37. bAIome Center for biomedical AI at University Medical Center Hamburg-Eppendorf (UKE)

Dr Anna Reinicke-Vogt (Universitätsklinikum Hamburg-Eppendorf)

Poster

Poster Session

bAIome is the center for biomedical AI at University Medical Center Hamburg-Eppendorf (UKE). The center consists of faculty and staff from various institutes within the UKE who are engaged in research and education in broad areas of biomedical AI. bAIome serves as a competence center bundling knowledge, expertise, & resources to provide a portfolio of services to help students, researchers and...

35. Benchmarking Robustness of Pathology Vision Foundation Models for Prostate Cancer Relapse Prediction

Anja Witte (Institute of Medical Systems Bioinformatics, Center for Biomedical AI (bAIome), Center for Molecular Neurobiology Hamburg (ZMNH), University Medical Center Hamburg-Eppendorf, Hamburg, Germany)

Poster

Poster Session

Prostate cancer relapse prediction is a challenging task within computational pathology as tissue preparation and digitization are not standardized. The different protocols lead to domain shifts, against which a deep learning model must be robust and focus on biological information rather than variations in appearance. We address this challenge through the usage of vision foundation models...

19. Constellation - a Flexible Control and Data Acquisition Framework

Stephan Lachnit (DESY)

Poster

Poster Session

The qualification of new detectors presents a challenging setting that requires stable operation of diverse devices, often employing multiple data acquisition (DAQ) systems running on several machines in a local network. Changes to these setups are frequent, such as using different reference detectors depending on the facility. Managing this complexity necessitates a system capable of...

34. DATS: An Expandable AI Platform for Multi-modal Data Analysis

Tim Fischer (Universität Hamburg)

Poster

Poster Session

The Discourse Analysis Tool Suite (DATS) is a collaborative web-based platform enabling researchers to manage and analyze multi-modal data with AI. The core features aim to support and enhance rigoros research. I.e. data management, search & filtering, classification, and quantitative analyses. To further expand the DATS, we are keen on learning from your research projects to expand...

48. EMOCONV-DIFF: Diffusion-based Speech Emotion Conversion for Non-parallel and In-the-wild Data

Navin Raj Prabhu (Signal Processing)

Poster

Poster Session

Speech emotion conversion is the task of converting the expressed emotion of a spoken utterance to a target emotion while preserving the lexical content and speaker identity. While most existing works in speech emotion conversion rely on acted-out datasets and parallel data samples, in this work we specifically focus on more challenging in-the-wild scenarios and do not rely on parallel data....

29. Enhancing Developer Experience in Near Real-Time Scientific Data Processing: The AsapoWorker Library

Diana Rueda (Deutsches Elektronen-Synchrotron DESY)

Poster

Poster Session

High-throughput scientific experiments generate massive data streams requiring near real-time processing for time-critical decision making. However, developing robust streaming workflows presents significant challenges in distributed computing environments.
We present AsapoWorker [1], a Python library that simplifies the development of processing workers on top of the Asapo [2] streaming...

15. Exploiting protein structure predictions and structural homology to identify viral immune evasion strategies.

Pasquale Lamagna (LIV / CSSB / UHH)

Poster

Poster Session

Viruses are infamously efficient at employing homology as a mechanism of immune evasion by imitating host proteins to escape immune response. This might be exploited to identify new virus-host interactions. We have implemented a computational workflow, tested on the Human Cytomegalovirus (HCMV), which conducts structural homology search to list viral proteins and their homologs in humans....

24. Exploration of Scientific Collections with Multimodal Agentic RAG Systems

Dr Martin Semmann (House of Computing and Data Science, Universität Hamburg)

Poster

Poster Session

In this paper, we explore using multi-modal agents based on Large-Vision-Language-Models (LVLMs) what a scholarly collections portal can be beyond a digital showcase of the university’s collections . We focus on the interactive exploration of scientific collections. Collection data is valued differently from different perspectives. For the university administrators, it is an item to be...

39. Framework for Distributed Near Real-Time Data Processing Pipelines

Marc-Olivier Andrez (Deutsches Elektronen-Synchrotron (DESY))

Poster

Poster Session

Processing large amount of data in near real-time during experiments at synchrotrons is enabling scientists to make the best use of limited beamtime [1]. However, building systems capable of handling data rates of several gigabytes per second over long periods of time requires specialized expertise in distributed computing [2], which limits the broader adoption of such systems at...

40. From Metadata to Models: Graph-Based Representation of Data Reduction Workflows at the European XFEL

Michael Schuh (European XFEL)

Poster

Poster Session

The European XFEL has updated its scientific data policy to require detailed data management plans (DMPs) and mandatory data reduction. We explore how DMPs, together with metadata and empirical traces from data management and storage systems, can be integrated into a scientific knowledge graph (SKG). This heterogeneous information network serves as a foundation for applying Heterogeneous Graph...

51. HoloPipe: Streamlining Phase Retrieval and Tomographic Reconstruction for X-ray Near-Field Holography Experiments at P05

Andre Lopes Marinho

Poster

Poster Session

X-ray near-field holography (NFH) is an advanced imaging technique that reveals the nanoscale internal structures of materials, making it particularly useful for studying a plethora of materials. Moreover, the specimens can be imaged using a single exposure, in a scalable field of view. However, the analysis of NFH data is complex, requiring sophisticated phase retrieval and tomographic...

21. Integration of Data Applications in a RDM Infrastructure by Interface Design

Mr Hagen Peukert (Universität Hamburg)

Poster

Poster Session

The poster to be presented addresses the problem of incorporating a steadily growing number of research software applications into an existing RDM infrastructure as well as transferring their diverse outputs to the existing storage systems using interface definitions. A subprocess in the general RDM infrastructure is proposed integrating a new software component, the data transfer facilitator...

43. KI4Demo: Research Software to promote AI for Demokratie and fight Disinformation

Dr Martin Semmann (Hub of Computing and Data Science, Universität Hamburg), Rudy Alexandro Garrido Veliz (Universität Hamburg), Seid Muhie Yimam (House of Computing and Data Sceince)

Poster

Poster Session

Social media increasingly fuel extremism and disinformation, especially in the right-wing agenda, and enable the rapid spread of antidemocratic narratives. Although there is plenty of research being done in the socio/political fields against these phenomena, there is a considerable gap between it and putting policy into practice. Our conjoined software engineer project called KI4Demo supports...

17. Language Models in Humanities

Thomas Asselborn (Universität Hamburg), Florian Marwitz (Universität Hamburg), Sylvia Melzer (Universität Hamburg)

Poster

Poster Session

Research groups in the humanities generate a substantial number of publications, contributing to an ever-expanding body of scholarly work. When a scholar is interested in the topics covered or has specific questions about (subsets of) publications, they must overcome the big number of publications to read. We demonstrate the use of language models in the humanities by showcasing two...

25. LLM-Powered Software Engineering

Tim Puhlfürß (University of Hamburg)

Poster

Poster Session

Large language models (LLMs) bear great potential for automating tedious development tasks, like creating and maintaining source code documentation. We assist software developers of European XFEL (EuXFEL) with LLM-powered tools that facilitate knowledge and documentation management. We present findings from two controlled experiments conducted with EuXFEL’s Data department, focusing on...

16. Managing Datasets in the Digital Age

Thomas Asselborn (Universität Hamburg), Florian Marwitz (Universität Hamburg), Sylvia Melzer (Universität Hamburg)

Poster

Poster Session

Scholars in the humanities working with datasets face two challenges: Discovering relevant datasets and publishing their own dataset after their research is completed. We propose a new filetype, namely CSMC (Computer Science Metadata Container), to bundle the raw research data alongside a visualization of the data. Scholars can view the visualization of a dataset before downloading the whole...

30. Project "Conseydo" - GDPR-compliant, digital consent management for research

Michael Wuppermann (Universität Hamburg)

Poster

Poster Session

The consent management platform, Conseydo, developed in the Flutter framework and funded by the funding program Calls4Transfer, uses a privacy by design approach to enable the GDPR-compliant digital creation, documentation, management and tracking of consent for research, for example within the stakeholder triad of teachers, parents and researchers. The plattform solves organizational...

45. Protokolibri – a convenient tool to track browsing behaviour

Felix Zielke (University of Hamburg)

Poster

Poster Session

Our poster presents Protokolibri, a distributed application for logging the browsing behavior of large groups of students on iPads. The developed browser plugin records tab events via Javascript and sends them asynchronously to the Protokolibri node.js server, which stores the data sorted by device name and timestamp.

The focus of the tool is on simplifying data collection. Previously,...

38. RAG for Fact-Checking with Real-World Claims

Ms Özge Sevgili (Hub of Computing and Data Science, University of Hamburg)

Poster

Poster Session

This paper presents UHH’s approach developed for the AVeriTeC shared task. The goal of the challenge is to verify given real-world claims with evidences from the Web. In this shared task, we investigate a Retrieval-Augmented Generation (RAG) model, which mainly contains retrieval, generation, and augmentation components. We start with the selection of the top 10k evidences via BM25 scores, and...

33. Signatures of Friendship. Exploring the Jerusalem Guestbook of Miryam and Moshe Ya'akov Ben-Gavriêl

Sebastian Schirrmeister (Universität Hamburg)

Poster

Poster Session

This project presents a browsable digital exploration environment for a multilingual private guestbook from 20th-century Jerusalem. The goal is to investigate curiosity-driven browsing strategies in archival contexts, going beyond systematic searches. By providing intuitive, user-friendly visualization solutions, the project aims to facilitate an exploratory approach and increase serendipitous...

46. STAC for federated data access to high-volume ESM datasets in preparation for Exascale

Kameswar Rao Modali (Deutsches Klimarechenzentrum(DKRZ))

Poster

Poster Session

Contemporary earth system models (ESM) perform simulations at kilometer scale resolution at various HPC centers. The data from these simulations aid in research and policy making. Hence the design of the data access system for a federated setup should consider the data, analysis tools and computing resources at each center. Also for efficient discoverability, the data management at each center...

Choose timezone

RSE Day 2025

Katrin Schöning-Stierand