Recent developments in open data policies of meteorological agencies have much expanded the set of up-to-date weather observation and forecast data that is publicly available to meteorological research and education. To improve use of this open data, we have developed 3-D visualization products that extract and display meteorological information in novel ways. In this demo, we present...
The poster presents findings from the DFG project InterSaME (2020–2023) focused on vowel-dots in early Qur’anic manuscripts. It highlights a vector-based transcription system as there is no current encoding or transcription tool for vowel-dots. Using a customised Archetype software instance, the team developed a pointer-based encoding method that describes each dot's position relative to...
The multidisciplinary nature of manuscript study at the CSMC results in an ever-increasing volume of digital data in various modalities, ranging from raw images of artefacts to automatically generated data from advanced acquisition techniques. The manual analysis of this data is typically time-consuming and susceptible to human error and bias. Therefore, a set of Pattern Analysis Software...
bAIome is the center for biomedical AI at University Medical Center Hamburg-Eppendorf (UKE). The center consists of faculty and staff from various institutes within the UKE who are engaged in research and education in broad areas of biomedical AI. bAIome serves as a competence center bundling knowledge, expertise, & resources to provide a portfolio of services to help students, researchers and...
Prostate cancer relapse prediction is a challenging task within computational pathology as tissue preparation and digitization are not standardized. The different protocols lead to domain shifts, against which a deep learning model must be robust and focus on biological information rather than variations in appearance. We address this challenge through the usage of vision foundation models...
The qualification of new detectors presents a challenging setting that requires stable operation of diverse devices, often employing multiple data acquisition (DAQ) systems running on several machines in a local network. Changes to these setups are frequent, such as using different reference detectors depending on the facility. Managing this complexity necessitates a system capable of...
The Discourse Analysis Tool Suite (DATS) is a collaborative web-based platform enabling researchers to manage and analyze multi-modal data with AI. The core features aim to support and enhance rigoros research. I.e. data management, search & filtering, classification, and quantitative analyses. To further expand the DATS, we are keen on learning from your research projects to expand...
Speech emotion conversion is the task of converting the expressed emotion of a spoken utterance to a target emotion while preserving the lexical content and speaker identity. While most existing works in speech emotion conversion rely on acted-out datasets and parallel data samples, in this work we specifically focus on more challenging in-the-wild scenarios and do not rely on parallel data....
High-throughput scientific experiments generate massive data streams requiring near real-time processing for time-critical decision making. However, developing robust streaming workflows presents significant challenges in distributed computing environments.
We present AsapoWorker [1], a Python library that simplifies the development of processing workers on top of the Asapo [2] streaming...
Viruses are infamously efficient at employing homology as a mechanism of immune evasion by imitating host proteins to escape immune response. This might be exploited to identify new virus-host interactions. We have implemented a computational workflow, tested on the Human Cytomegalovirus (HCMV), which conducts structural homology search to list viral proteins and their homologs in humans....
In this paper, we explore using multi-modal agents based on Large-Vision-Language-Models (LVLMs) what a scholarly collections portal can be beyond a digital showcase of the university’s collections . We focus on the interactive exploration of scientific collections. Collection data is valued differently from different perspectives. For the university administrators, it is an item to be...
Processing large amount of data in near real-time during experiments at synchrotrons is enabling scientists to make the best use of limited beamtime [1]. However, building systems capable of handling data rates of several gigabytes per second over long periods of time requires specialized expertise in distributed computing [2], which limits the broader adoption of such systems at...
The European XFEL has updated its scientific data policy to require detailed data management plans (DMPs) and mandatory data reduction. We explore how DMPs, together with metadata and empirical traces from data management and storage systems, can be integrated into a scientific knowledge graph (SKG). This heterogeneous information network serves as a foundation for applying Heterogeneous Graph...
X-ray near-field holography (NFH) is an advanced imaging technique that reveals the nanoscale internal structures of materials, making it particularly useful for studying a plethora of materials. Moreover, the specimens can be imaged using a single exposure, in a scalable field of view. However, the analysis of NFH data is complex, requiring sophisticated phase retrieval and tomographic...
The poster to be presented addresses the problem of incorporating a steadily growing number of research software applications into an existing RDM infrastructure as well as transferring their diverse outputs to the existing storage systems using interface definitions. A subprocess in the general RDM infrastructure is proposed integrating a new software component, the data transfer facilitator...
Social media increasingly fuel extremism and disinformation, especially in the right-wing agenda, and enable the rapid spread of antidemocratic narratives. Although there is plenty of research being done in the socio/political fields against these phenomena, there is a considerable gap between it and putting policy into practice. Our conjoined software engineer project called KI4Demo supports...
Research groups in the humanities generate a substantial number of publications, contributing to an ever-expanding body of scholarly work. When a scholar is interested in the topics covered or has specific questions about (subsets of) publications, they must overcome the big number of publications to read. We demonstrate the use of language models in the humanities by showcasing two...
Large language models (LLMs) bear great potential for automating tedious development tasks, like creating and maintaining source code documentation. We assist software developers of European XFEL (EuXFEL) with LLM-powered tools that facilitate knowledge and documentation management. We present findings from two controlled experiments conducted with EuXFEL’s Data department, focusing on...
Scholars in the humanities working with datasets face two challenges: Discovering relevant datasets and publishing their own dataset after their research is completed. We propose a new filetype, namely CSMC (Computer Science Metadata Container), to bundle the raw research data alongside a visualization of the data. Scholars can view the visualization of a dataset before downloading the whole...
The consent management platform, Conseydo, developed in the Flutter framework and funded by the funding program Calls4Transfer, uses a privacy by design approach to enable the GDPR-compliant digital creation, documentation, management and tracking of consent for research, for example within the stakeholder triad of teachers, parents and researchers. The plattform solves organizational...
Our poster presents Protokolibri, a distributed application for logging the browsing behavior of large groups of students on iPads. The developed browser plugin records tab events via Javascript and sends them asynchronously to the Protokolibri node.js server, which stores the data sorted by device name and timestamp.
The focus of the tool is on simplifying data collection. Previously,...
This paper presents UHH’s approach developed for the AVeriTeC shared task. The goal of the challenge is to verify given real-world claims with evidences from the Web. In this shared task, we investigate a Retrieval-Augmented Generation (RAG) model, which mainly contains retrieval, generation, and augmentation components. We start with the selection of the top 10k evidences via BM25 scores, and...
This project presents a browsable digital exploration environment for a multilingual private guestbook from 20th-century Jerusalem. The goal is to investigate curiosity-driven browsing strategies in archival contexts, going beyond systematic searches. By providing intuitive, user-friendly visualization solutions, the project aims to facilitate an exploratory approach and increase serendipitous...
Contemporary earth system models (ESM) perform simulations at kilometer scale resolution at various HPC centers. The data from these simulations aid in research and policy making. Hence the design of the data access system for a federated setup should consider the data, analysis tools and computing resources at each center. Also for efficient discoverability, the data management at each center...
This poster presents the final iteration of the CaloClouds series. Simulation of photon showers in the granularities expected in a future Higgs factory is computationally challenging. A viable simulation must capture the find details exposed by such a detector, yet be substantially faster than MCMC methods. The Caloclouds model utilises point cloud diffusion and normalising flows to replicate...
The "Digital Edition Levezow Album" project is an interdisciplinary collaboration between the Hub of Computing and Data Science (HCDS), the Department of Art History at the University of Hamburg, and the State and University Library Hamburg. The project aims to digitally process and interactively visualize a previously unexplored sketchbook from the late 17th century, containing drawings on...
The ELECTRODE package is a module in the official release of the molecular dynamics code LAMMPS and implements the constant potential method and related methods. Utilizing the massively parallel architecture of LAMMPS with neighbor lists and fast Fourier transforms, the package efficiently calculates interactions between atoms and minimizes their energy as a function of atom...
We introduce EncouRAGe, a comprehensive Python-based framework designed to streamline the development and evaluation of Retrieval-Augmented Generation (RAG) systems using local Large Language Models (LLMs). Encourage integrates leading tools such as vLLM for efficient inference, Jinja2 for dynamic prompt templating, and MLflow for observability and performance tracking. It supports both...
We present a privacy-preserving research environment integrating offline Large Language Models (LLMs), AI agents, and scalable infrastructure. By deploying private LLMs via Ollama and containerized workflows on Kubernetes, researchers can automate tasks like literature review, code generation, and secure data processing without compromising sensitive information. AI agents—coordinated through...
The Result Assessment Tool (RAT) is a Python-based software toolkit that addresses the critical research challenge of accessing and analyzing data from various search systems. It uses several computational methods, including Selenium for robust web scraping, Flask for the web interface, PostgreSQL for data management, and automated classifiers for content analysis. With RAT, researchers can...
Resources for research on sign languages are rare and can often be difficult to locate. Few centralised sources of information exist. The Sign Language Dataset Compendium helps by providing an overview of existing lexical resources and linguistic corpora, as well as summary of popular data collection tasks shared among corpora. To date it covers resources for 82 different sign languages. The...
Benchmarking applications in high-performance computing (HPC) systems is essential for optimising runtime, reducing energy consumption, and ensuring efficient hardware utilisation. However, accessing and interpreting performance metrics can be challenging and error prone. To address this, we present xbat (extended benchmarking automation tool), developed by MEGWARE Computer Vertrieb und...
We present MENTO, a data processing toolkit that remotely runs external analysis software on-demand using the DESY high-performance computing (HPC) cluster.
MENTO is set up to require no input from users except to point to the desired analysis software, and the entire processing pipeline is then managed automatically, including data input, access to the HPC cluster, job submissions to a...
Computational pathology has made tremendous progress on dedicated datasets in the past years. However, currently such algorithms are still not used routinely for diagnostics in the clinics. There is still a large gap between research and clinics and the factors that contribute to this, such as the focus on reproducing subjective scores and the large variance in performance depending on the...
Continuous Integration and Continuous Deployment is a modern Software Engineering best practice that enables efficient large scale software development and use. There are a variety of popular Ci/CD tools that help in adopting these practices. In this poster we focus on the kinds of software, their runtime environments,packaging and deployment tools and techniques used at DESY that can easily...
The presentation will introduce a GraphRAG-based approach to research data retrieval from research data catalogues, using the Text+ Registry as an example.
Retrieval-Augmented Generation (RAG) systems have become a cornerstone for LLM-based question-answering tasks involving individual (potentially private or sensitive) unstructured data. However, traditional RAG pipelines often lack an...
Schematron is an ISO-standardized validation language for structured data (ISO/IEC 19757:3). It lets you evaluate assertion tests for selected parts of a document. It was first designed as an international standard in 2006 and has been updated continuously. The standardization process of the 4th edition is in its final stages and is expected to finish in September this year.
Schematron's...
The optical flow method is one of the emerging approaches for Digital Volume Correlation (DVC) to analyze the volumetric deformation during in situ experiments of material science research. However, deep optical flow neural networks for DVC are limited by memory requirement, especially for high volumetric resolution data from Synchrotron Radiation Computed Tomography (SRCT) in the scale of...
X-ray near-field holography is a full-field phase-sensitive microscopy method. It allows to image specimen with a single exposure in a scalelable field of view. The measurements are so called holograms and require reconstruction to obtain the actual image of the specimen. The reconstruction is the bottleneck of this method. It can be time consuming and algorithm parameters need to be tuned...
For many people, the media are the main source of information about climate change. An increasing number of people have turned to online services from both traditional and new media providers to stay informed. As a result, studying online reporting is essential to understand how public debates about climate change are shaped. To support this, the University of Hamburg developed the Online...
The Data Hub is an open source software framework created to address the needs of collaborative research using diverse data across disciplines. It is developed in Python, on top of the Django web-framework and a PostGIS/PostgreSQL database, following computer science best practices as well as the FAIR4RS principles.
The framework’s core function allows reproducible...