Introduction to the first workshop on Large Language Models for Research Data Management?!
Research data management (RDM) has become an important discipline that enables researchers to effectively organise, preserve and share their research results.
RDM is a new development that aims to prepare researchers for the future by building on the principles of open science. It utilises innovative...
*This contribution explores the application of large language models (LLMs) in labour market research data management, particularly in occupational data analysis. Based on our empirical studies of the automated classification of job titles and critical evaluations of AI-assisted text interpretation, we contend that, although LLMs present promising opportunities to improve research processes,...
This paper explores the potential of using large language models in multilingualism research to accelerate data processing (speech-to-text). The main issues relating to the language of bilingual individuals are discussed qualitatively using a Polish-German recording as an example.
Research data repositories store numerous entries of research data, to among other advantages one goal is allowing to store us all data to reproduce experiments.
Working with large corpora of texts is made significantly easier with Large Language Models.
However, Large Language Models are trained for general purposes and are note finetuned for the data originating from different kinds of...
Abstract
With the emergence of large language models, the long studied field of the Text-to-SQL problem was elevated
into new spheres. In this paper, we test how our LLM fine-tuning approach performs on two relational databases
(small vs. big) and compare it to a default setting. The results are convincing: using in-context learning boosts the
performance from a merely 35% (default) to...
Scholars have access to large amounts of data and publications stored in RDRs (Research Data Repositories). LLMs (Large Language Models) can efficiently work with textual data. But, since LLMs are pretrained and have a limited context window, they cannot work with large amounts of text. For this, the standard approach is to use RAG (Retrieval Augmented Generation), where an embedding space is...