RSE Day 2025
AER Atrium
Science City Bahrenfeld
The Hub of Computing and Data Science will hold an RSE day in the Science City Bahrenfeld. The event is tailored to scientists and software developers working in Research Software Engineering and people interested in the field. It aims to strengthen our community in Hamburg, share experiences, and showcase our work. As the requirements for sustainable research software increase, we see the potential to build a community of practice that meets regularly and tries to leverage synergies in the metropolitan area.
Research Software Engineers (RSEs) specialize in developing and maintaining software that supports scientific research. They combine software engineering expertise with a deep understanding of research methods and domain knowledge in specific scientific fields. They create tools, applications, and computational models that facilitate data analysis, simulations, and visualizations for their own research or that of collaborating researchers.
That means RSEs are crucial in bridging the gap between advanced computing and scientific inquiry, ensuring that software solutions are robust, efficient, and tailored to researchers' needs.
If you feel you belong to this community, you are invited to submit an abstract for a poster showing your work and complementary lightning talks.
This event will provide a platform for lively discussions with lightning talks and community engagement during the poster sessions. Two keynote presentations will round out the program by providing insights into successfully implemented RSE. Hopefully, it will generate new ideas and help find collaboration options.
-
-
09:05
→
09:20
Welcome Room 0005/0010 (AER)
Room 0005/0010
AER
Albert-Einstein-Ring 8 - 10 22761 HamburgConveners: Chris Biemann (House of Computing & Data Science), Dr Katrin Schöning-Stierand (Hub of Computing and Data Science (HCDS)), Dr Martin Semmann (House of Computing and Data Science, Universität Hamburg) -
09:20
→
10:05
Keynote 1: Research Software and its Engineers Room 0005/0010 (AER)
Room 0005/0010
AER
Albert-Einstein-Ring 8 - 10 22761 HamburgConvener: Dr Alexander Struck (Cluster of Excellence »Matters of Activity« Humboldt-Universität zu Berlin) -
10:05
→
10:35
Lightning Talks: Block 1 Room 0005/0010 (AER)
Room 0005/0010
AER
Albert-Einstein-Ring 8-10Convener: Anna Reinicke-Vogt (Universitätsklinikum Hamburg-Eppendorf)-
10:05
CaloClouds3; Diffusion and normalising flows 3m
This poster presents the final iteration of the CaloClouds series. Simulation of photon showers in the granularities expected in a future Higgs factory is computationally challenging. A viable simulation must capture the find details exposed by such a detector, yet be substantially faster than MCMC methods. The Caloclouds model utilises point cloud diffusion and normalising flows to replicate MCMC simulation with exceptional actuary. Our latest iteration has taken advantage of domain knowledge to reduce the model complexity, giving a speed up of up to 2 orders of magnitude. Finally, we present the results of reconstructions performed on CaloClouds 3 output against the results from the leading MCMC simulation, Geant4, thus demonstrating that this model provides reliable physics reproductions.
Speaker: Henry Day-Hall (DESY) -
10:08
Private AI for Research: Secure, Scalable and Automated 3m
We present a privacy-preserving research environment integrating offline Large Language Models (LLMs), AI agents, and scalable infrastructure. By deploying private LLMs via Ollama and containerized workflows on Kubernetes, researchers can automate tasks like literature review, code generation, and secure data processing without compromising sensitive information. AI agents—coordinated through n8n—enhance productivity by orchestrating multi-step research workflows, such as relevance scoring of abstracts and deep content summarization. Designed with biomedical applications in mind, the environment enables responsible use of clinical and omics data in line with the stringent data governance requirements at the University Medical Center Hamburg-Eppendorf (UKE).
Speaker: Sven Heins (Universität Hamburg) -
10:11
xbat – An Easy-to-Use and Universally Applicable Benchmarking Automation Tool for HPC Software Within the Project hpc.bw (dtec.bw) 3m
Benchmarking applications in high-performance computing (HPC) systems is essential for optimising runtime, reducing energy consumption, and ensuring efficient hardware utilisation. However, accessing and interpreting performance metrics can be challenging and error prone. To address this, we present xbat (extended benchmarking automation tool), developed by MEGWARE Computer Vertrieb und Service GmbH, as an easy-to-use and universally applicable tool to automate benchmarking and simplify performance analysis for HPC users of all skill levels.
This poster provides an overview of xbat’s architecture, features, and case studies within the project hpc.bw (dtec.bw). We focus on the open-source molecular dynamics research software ls1 mardyn, which comes with an auto-tuning library AutoPas, and the closed-source mathematical optimisation package Gurobi.
Speaker: Willi Leinen (Helmut Schmidt University) -
10:14
Digital Edition of the Levezow Album: Interactive Visualization of 17th-Century Drawings 3m
The "Digital Edition Levezow Album" project is an interdisciplinary collaboration between the Hub of Computing and Data Science (HCDS), the Department of Art History at the University of Hamburg, and the State and University Library Hamburg. The project aims to digitally process and interactively visualize a previously unexplored sketchbook from the late 17th century, containing drawings on anatomy, antiquity, proportion studies, and natural history.
By leveraging modern technologies such as digital editing techniques and advanced image processing, the Levezow Album is made accessible to a broad audience. Each page of the album is accompanied by detailed explanations authored by students of the Department of Art History. These texts provide context regarding the significance, origins, and intricacies of the drawings. Additionally, an interactive commenting feature allows users to suggest alternative sources and engage in a dialogue about the artworks.
This project demonstrates how digital methods can be used in the humanities to reinterpret and make historical artifacts accessible. It serves as an example of the successful integration of research, education, and digital technology to promote cultural heritage.Speaker: Amy Isard (HCDS/UWA) -
10:17
RAT: A Computational Toolkit for Scalable Search System Analysis 3m
The Result Assessment Tool (RAT) is a Python-based software toolkit that addresses the critical research challenge of accessing and analyzing data from various search systems. It uses several computational methods, including Selenium for robust web scraping, Flask for the web interface, PostgreSQL for data management, and automated classifiers for content analysis. With RAT, researchers can design studies to systematically collect extensive search results and perform manual or automated evaluations. Key application areas include information science, health information, media and communication studies, and social sciences. The tool's significance lies in its methodological consistency and the significant improvement it offers in conducting comprehensive, scalable, and data-driven investigations based on results from search systems.
Speaker: Mr Sebastian Sünkler (Hamburg University of Applied Sciences) -
10:20
The Sign Language Dataset Compendium 3m
Resources for research on sign languages are rare and can often be difficult to locate. Few centralised sources of information exist. The Sign Language Dataset Compendium helps by providing an overview of existing lexical resources and linguistic corpora, as well as summary of popular data collection tasks shared among corpora. To date it covers resources for 82 different sign languages. The Compendium is published as a website, a PDF document and using metadata formats suitable for integration with dataset aggregator platforms. Its production pipeline includes a web editor for editorial staff, XML-based semantic markup and automatic integration of archival copies for external links to counter link rot and content drift.
Speaker: Marc Schulder (IDGS, Universität Hamburg) -
10:23
EncouRAGe: Evaluating RAG local, fast and reliable 3m
We introduce EncouRAGe, a comprehensive Python-based framework designed to streamline the development and evaluation of Retrieval-Augmented Generation (RAG) systems using local Large Language Models (LLMs). Encourage integrates leading tools such as vLLM for efficient inference, Jinja2 for dynamic prompt templating, and MLflow for observability and performance tracking. It supports both in-memory (Chroma) and scalable (Qdrant) vector databases for optimized context retrieval. The framework offers modular RAG methods, customizable inference templates, and detailed evaluation metrics, enabling rapid prototyping and benchmarking of context-aware LLM applications. Encourage aims to democratize LLM-based development with a focus on flexibility, speed, and reproducibility.
Speaker: Jan Strich (Universität Hamburg) -
10:26
ELECTRODE: An electrochemistry package for atomistic simulations 3m
The ELECTRODE package is a module in the official release of the molecular dynamics code LAMMPS and implements the constant potential method and related methods. Utilizing the massively parallel architecture of LAMMPS with neighbor lists and fast Fourier transforms, the package efficiently calculates interactions between atoms and minimizes their energy as a function of atom charges.
Standard Ewald summation and the particle-particle particle-mesh algorithm have been implemented for interaction calculations. For the energy minimization, a matrix inversion and the conjugate gradient method can be used.
Numerous research groups have used the ELECTRODE package for atomistic models of supercapacitors, batteries, the electrolyte Seebeck effect and electron transfers at functionalized interfaces. Further, the recently added charge equilibration enables modeling of non-metallic materials.
Speaker: Ludwig Ahrens-Iwers (TU Hamburg) -
10:29
Computational pathology - the gap between clinics and research 3m
Computational pathology has made tremendous progress on dedicated datasets in the past years. However, currently such algorithms are still not used routinely for diagnostics in the clinics. There is still a large gap between research and clinics and the factors that contribute to this, such as the focus on reproducing subjective scores and the large variance in performance depending on the data source. One important goal is, therefore, to overcome subjective scores by introducing objective endpoints, as well as developing quantifiable and objective metrics based on specialised microscopy types. In order to further close the gap, robustness to domain shifts between datasets and generalizability, as well as measures of uncertainty to defer uncertain decisions are important topics.
Speaker: Marina Zimmermann (University Medical Center Hamburg-Eppendorf)
-
10:05
-
10:35
→
12:35
Poster Session AER Atrium
AER Atrium
Science City Bahrenfeld
Albert-Einstein-Ring 8-10 22761 HamburgConvener: Dr Katrin Schöning-Stierand (Hub of Computing and Data Science (HCDS)) -
12:35
→
13:35
Lunch break 1h AER Atrium
AER Atrium
Science City Bahrenfeld
Albert-Einstein-Ring 8-10 22761 Hamburg -
13:35
→
14:20
Keynote 2: New tricks for old codes: RSE at the interface Room 0005/0100 (AER)
Room 0005/0100
AER
Albert-Einstein-Ring 8 - 10 22761 HamburgBuilding modern interfaces to legacy simulations — enabling streamlined workflows and cross-code comparison
Convener: Prof. Hans Fangohr (Max Planck Institute for Structure and Dynamics of Matter) -
14:20
→
14:50
Lightning Talks: Block 2 Room 0005/0010 (AER)
Room 0005/0010
AER
Albert-Einstein-Ring 8-10Convener: Seid Muhie Yimam (House of Computing and Data Sceince)-
14:20
Analyze my data, I don't care how 3m
We present MENTO, a data processing toolkit that remotely runs external analysis software on-demand using the DESY high-performance computing (HPC) cluster.
MENTO is set up to require no input from users except to point to the desired analysis software, and the entire processing pipeline is then managed automatically, including data input, access to the HPC cluster, job submissions to a batch processing scheduler, and result writing.
Analysis is triggered automatically during an experiment, and the processed results are transparently made available to users so they can immediately evaluate the experiment, without having to manually handle any raw data at all.Speaker: Vijay Kartik (DESY) -
14:23
Continuous Integration and Continuous Deployment at DESY 3m
Continuous Integration and Continuous Deployment is a modern Software Engineering best practice that enables efficient large scale software development and use. There are a variety of popular Ci/CD tools that help in adopting these practices. In this poster we focus on the kinds of software, their runtime environments,packaging and deployment tools and techniques used at DESY that can easily be leveraged by participating institutions under DAPHNE4NFDI.
Speaker: Parthasarathy Tirumalai Nallam Chakravarty (DESY) -
14:26
Ten Years of Experience with the Online Media Monitor for Climate Change 3m
For many people, the media are the main source of information about climate change. An increasing number of people have turned to online services from both traditional and new media providers to stay informed. As a result, studying online reporting is essential to understand how public debates about climate change are shaped. To support this, the University of Hamburg developed the Online Media Monitor (OMM) for climate change in 2015. Here, we want to share our experiences with developing and, especially, maintaining the OMM over the past ten years as the online world has continued to change.
Speaker: Mr Remon Sadikni -
14:29
GraphRAG based research data retrieval 3m
The presentation will introduce a GraphRAG-based approach to research data retrieval from research data catalogues, using the Text+ Registry as an example.
Retrieval-Augmented Generation (RAG) systems have become a cornerstone for LLM-based question-answering tasks involving individual (potentially private or sensitive) unstructured data. However, traditional RAG pipelines often lack an in-depth understanding of the underlying data and the ability to retrieve contextual information from it.
GraphRAG based approaches can address this by utilizing structured data in a knowledge graph to capture deeper relational context, enabling more precise retrieval and a more nuanced understanding.
The first implementation has already shown that GraphRAG outperforms standard RAG in terms of both retrieval precision and response quality.
The presentation will also contain a system demonstration.Speaker: Timm Lehmberg (Akademie der Wissenschaften in Hamburg) -
14:32
ISO Schematron: A feather duster to reach the parts other schema languages cannot reach 3m
Schematron is an ISO-standardized validation language for structured data (ISO/IEC 19757:3). It lets you evaluate assertion tests for selected parts of a document. It was first designed as an international standard in 2006 and has been updated continuously. The standardization process of the 4th edition is in its final stages and is expected to finish in September this year.
Schematron's use of XPath both as the language to select the portion of a document and as the language of the assertion tests gives Schematron the flexibility to validate arbitrary relationships and dependencies of information items in a document. What also sets Schematron apart from other languages is that it encourages the use of natural language descriptions targeted to human readers. This way, validation can be more than just a binary distinction (document valid/invalid) but also support authors of in-progress documents with quick feedback on erroneous or unwanted document structure and content. The flexibility and (relative) simplicity of Schematron make it an invaluable tool for XML-based text-encoding projects.
SchXslt is one of the leading implementations of ISO Schematron, powered by the mature XSL Transformations language. It goes beyond the features of the ISO standard and supports, among other things, streaming validation. It is MIT-licensed and used across a wide range of industries, such as publishing and the digital encoding of humanities artifacts (TEI/MEI).
Speaker: David Maus (Staats- und Universitätsbibliothek Hamburg Carl von Ossietzky) -
14:35
SmartPhase: Start to End Holotomography 3m
X-ray near-field holography is a full-field phase-sensitive microscopy method. It allows to image specimen with a single exposure in a scalelable field of view. The measurements are so called holograms and require reconstruction to obtain the actual image of the specimen. The reconstruction is the bottleneck of this method. It can be time consuming and algorithm parameters need to be tuned precisely.
The goal of SmartPhase is to reduce these pain points and offer a solution which allows non-expert users to carry out reconstructions online during their experiment.Speaker: Johannes Hagemann -
14:38
The Data Hub: Enhancing Collaborative Research and Intelligence through Reproducible Data Harmonization 3m
The Data Hub is an open source software framework created to address the needs of collaborative research using diverse data across disciplines. It is developed in Python, on top of the Django web-framework and a PostGIS/PostgreSQL database, following computer science best practices as well as the FAIR4RS principles.
The framework’s core function allows reproducible data harmonization for analysis on temporal and spatial dimensions, while managing data governance through FAIR metadata and documentation. This way, it aims to be a piece in the diverse puzzle of open science standards and tools.
Its current status is tailored to global health and public health intelligence communities, while also focusing on transfer and reusabillity in other disciplines.
Speaker: Jonathan Ströbele (BNITM) -
14:41
Memory Efficient Volumetric Deep Neural Network for Digital Volume Correlation 3m
The optical flow method is one of the emerging approaches for Digital Volume Correlation (DVC) to analyze the volumetric deformation during in situ experiments of material science research. However, deep optical flow neural networks for DVC are limited by memory requirement, especially for high volumetric resolution data from Synchrotron Radiation Computed Tomography (SRCT) in the scale of micro-meter or nano-meter.
In this work, we extend our study on optical flow networks VolRAFT, by focusing on memory efficiency during the supervised training of volumetric neural networks using high-resolution micro-CT and nano-CT data. We present approaches to reduce maximum memory requirement based on network architectural and non-architectural changes, utilizing cutting-edge Graphics Processing Units (GPUs). We develop an “on-the-fly” synthetic dataset generator to reduce the storage space needed during training. We compare these approaches by the memory requirement and the accuracy of deformation fields under various volumetric resolutions, based on experimental data of bone-implant materials, lignocellulosic tissues and shape memory alloy wires.Speaker: Tak Ming Wong (Helmholtz-Zentrum Hereon)
-
14:20
-
14:50
→
16:50
Poster Session AER Atrium
AER Atrium
Science City Bahrenfeld
Albert-Einstein-Ring 8-10 22761 HamburgConvener: Dr Katrin Schöning-Stierand (Hub of Computing and Data Science (HCDS)) -
16:50
→
17:00
Closing AER Atrium
AER Atrium
Science City Bahrenfeld
Albert-Einstein-Ring 8-10 22761 HamburgConveners: Chris Biemann (House of Computing & Data Science), Dr Katrin Schöning-Stierand (Hub of Computing and Data Science (HCDS)), Dr Martin Semmann (House of Computing and Data Science, Universität Hamburg)
-
09:05
→
09:20