PhD candidate AI-driven FAIR data extraction and harmonization

You cannot apply for this job anymore (deadline was 23 Mar)

Please note: You cannot apply for this job anymore (deadline was 23 Mar). Browse the current job offers or choose an item in the top navigation above.

PhD candidate AI-driven FAIR data extraction and harmonization

Deadline Published on Vacancy ID 250184

Academic fields

Health

Job types

PhD; Research, development, innovation

Education level

University graduate

Weekly hours

36 hours per week

Salary indication

max. €3017 per month

Location

Hanzeplein 1, 9713 GZ, Groningen

View on Google Maps

Job description

- Data harmonization: develop methods to map free-text clinical data to standardized coding systems and ontologies, ensuring compliance with FAIR principles.
- AI model innovation: select, adapt, and refine large language models (local, cluster, or cloud-based) and frameworks (Ollama, OntoGPT, LangChain, etc) for automated data recoding.
- Prompt and agentic workflow engineering: devise and implement best practices for improving language model performance in data extraction and ontology mapping.
- Use case development: Collaborate with researchers and clinicians to apply your solutions in real-world scenarios, such as integrating rare disease alerts into EHRs or re-analyzing existing cohorts.
- Interdisciplinary collaboration: Work across data science, software engineering genomics, and clinical teams to create scalable solutions that enhance patient care and research outcomes.

Project AI-driven FAIR data extraction and harmonization
By converting clinical notes and cohort variables into standard coding systems you will help create sufficiently large datasets for automated analysis and advanced diagnostics. Imagine helping rare disease patients by mapping textual symptom descriptions to precise phenotypic codes, which then combine with genomic data to identify potential causative variants. Or envision scaling your methods to unify data from multiple large cohort studies to research healthy child development, by seamlessly integrating local data models with emerging APIs such as DataSHIELD, Beacon or FAIR Data Point to create discoverability and analysis, and build new global collaborations.

Your research will focus on leveraging state-of-the-art Large Language Models to drive this conversion process, driven by many open questions. Which model types and sizes are most effective? How should they be prompted, orchestrated, and validated for optimal accuracy? Could we deploy them locally on our own cluster, or should we tap into cloud resources? Can we enable our partner universities and hospitals to run them locally in a federation? You will experiment with existing agentic frameworks like Ollama, LangChain, and OntoGPT to discover and refine best practices.

You will develop novel methods that will have a direct real-world impact: from improving patient diagnoses and enabling large scale anonymized data reuse for research, to laying groundwork for deeper integration with electronic health records for healthcare mainstreaming. The UMCG is a world-leader in terms of integrating AI in healthcare processes and we will leverage this position in this project to achieve global impact. Join our team of forward-thinking researchers and clinicians to shape the future of AI-driven data extraction and harmonization for healthcare.


Requirements

- Master’s degree in AI, Computer Science, Bioinformatics, or a related field.
- Passion for machine learning, natural language processing, and biomedical data.
- Strong analytical skills and a willingness to learn new techniques.
- Excellent communication skills and a collaborative mindset.
- Familiarity with relevant technologies are a big plus (e.g. ontologies, coding systems, FAIR data principles, agentic AI frameworks, programming in R, Java, or Python).

Conditions of employment

- A dynamic research environment at the forefront of AI-driven healthcare innovation.
- Access to diverse, real-world medical data sets and cutting-edge computational resources.
- Support and collaboration with MOLGENIS large open source scientific software team to help you deploy and test your methods in working solutions.
- Mentorship by leading experts in AI, genomics, and clinical informatics.
- Opportunities to publish in high-impact journals and present at international conferences.
This is a full-time PhD contract for 4 years in an excellent environment for further development. First, a temporary one-year position will be offered with the option of renewal for another 3 years. Your salary will be a minimum of € 2.901,- gross per month in the first year and a maximum of € 3.677,- (scale PhD) in the final (4th) year, based on a full-time appointment. In addition, the UMCG will offer you 8% holiday pay, and 8.3% end-of-year bonus. The conditions of employment comply with the Collective Labour Agreement for Medical Centres (CAO-UMC).

Apply now and join us in revolutionizing how medical data is utilized in the future of healthcare. We look forward to hearing from you!

Department

Genetica

The position is part of the Genomics Coordination Centre (GCC), the ‘big data science’ research & service hub of the University Medical Centre Groningen (UMCG) and University of Groningen (rank 66 worldwide, 3rd best place to work in EU), hosted by the Department of Genetics. Our mission is to accelerate scientific discovery in health data with innovative methods and tools that expedite medical research and improve people's lives, using open source software and large computer ‘clouds’, in particular the MOLGENIS software that we lead, but also DataSHIELD, Singularity, RedCap, XNAT, OpenStack etc.

Application procedure

Joeri van der Velde, k.j.van.der.velde@umcg.nl, telefoonnummer: 06 1981 4646

Working at UMCG

Learn more about research at UMCG. Discover our key areas of research, our facilities, networks and partners.

Read more