Job description
Interested in making representation and generative learning work with structured data (e.g. tables
in spreadsheets and databases) to accurately, securely, and efficiently, democratize insights from
data? This 4-year PhD starting September 2024 is for you!
Goal of the DataLibra project
Approximately 120 zettabytes of data has been collected worldwide but less than 1% is actually
used. Structured data, e.g. tables, spreadsheets, and relational databases, is prevailing in
organizations and typically informs important decisions in healthcare, governments and finance.
Yet, while AI has demonstrated a high impact on applications on text and images, proportional
progress on structured data is lacking. With the DataLibra project, we aim to close this gap, by
developing AI models and tools for structured data (Table Representation Learning), to help
organizations, of any size, domain, and level of data literacy, get insights from structured data,
efficiently, accurately and securely.
Goal of this PhD project
Following recent developments in AI, large language models (LLMs) have been explored for data
analytics tasks (e.g. text-to-sql), but show limited accuracy in domain-specific contexts with
structured data. In this project, we will investigate and design interactive intelligent systems for
data analytics tasks, while accounting for two key challenges: trustworthiness of the outputs
(factuality), and security constraints of proprietary data contexts as in healthcare, enterprises, and
governments. Potential directions to explore are agentic systems, retrieval augmented generation,
(instruction) fine-tuning, and others.
What you will be doing
Inform a research agenda on the PhD topic for a timespan of four years.
Develop methods and systems for contextualizing generative AI for analytics over
structured data.
Actively collaborate with other researchers in the DataLibra project (students, 4-5 PhDs,
postdocs, PI) and external collaborators (e.g. Amsterdam UMC, University of Amsterdam).
Communicate research outcomes through papers and presentations at conferences,
workshops and other (scientific) gatherings.
Assist in relevant teaching activities at universities, such as thesis supervision and assisting
in courses.
Employer
Centrum Wiskunde & Informatica
Centrum Wiskunde & Informatica (CWI) is the Dutch national research institute for mathematics and computer science and is part of the
Institutes Organisation of the
Dutch Research Council (NWO). The mission of CWI is to conduct pioneering research in mathematics and computer science, generating new knowledge in these fields and conveying it to trade, industry, and society at large.
CWI is an internationally oriented institute, with 160 scientists from approximately 27 countries, an informal atmosphere and short lines of communication. We have an activity committee that organizes after-work activities and an informal women’s network.
CWI is located at Science Park Amsterdam, the home of AMS-IX, that is presently developing into a major location of research in the physical sciences in The Netherlands, housing the sciences of the University of Amsterdam as well as several other national research institutes next to CWI.