This is a 4-year paid PhD position. The position will be with the Data and AI cluster at the Eindhoven Univ. of Technology (TU/e) and ASML:
- In the Data and AI cluster, we study foundations of data and AI for the present and the future. We design new methods, develop algorithms and tools with a view at expanding the reach of databases and AI and their generalization abilities. In particular, we study foundational issues of robustness, safety, fairness, trust, reliability, tractability, scalability, interpretability and explainability of data and AI. Currently, DAI includes five research groups: Uncertainty in AI, Generative AI, Automated ML, Data Mining, and Databases.
- ASML, a leader in semiconductor manufacturing, faces challenges with limited and unbalanced data in metrology and diagnostics for their photolithography machines. Traditional approaches struggle with such data constraints. To address this, ASML explores foundation models, robust and adaptable models trained on extensive datasets. These models can effectively utilize small amounts of proprietary data, enhancing metrology and diagnostics accuracy. This innovation aligns with ASML's commitment to improving semiconductor manufacturing. By leveraging advanced machine learning techniques, ASML aims to optimize chip production, leading to higher yields and superior quality.
You will be supervised by Dr. J.M. Tomczak (TU/e), Prof. M. Pechenizkiy (TU/e), Prof. G. Fletcher (TU/e), and Dr. J. Kustra (ASML). You will be working in close collaboration with the Diagnostics & Data Science Group in ASML Research. This multidisciplinary team focuses on fundamentally exploring and prototyping the next generation knowledge-informed solutions for ASML, Metrology and Lithography challenges. Given the system complexity, a core challenge is in the diagnostics of (rarely occurring) failures, where the existing knowledge on system design is brought together with physics understanding as well as system data to reason on the problem potential root causes. You will participate in cutting-edge research, publish your work in leading conferences (NeurIPS, ICML, ICLR, AISTATS, UAI) and journals (TML, IEEE TPAMI, JMLR), and contribute to open-source tools.
You will work on developing a framework that will assist engineers in their diagnostics work and, consequently, shorten the downtime of a system. Additionally, the following assumptions are considered: (i) the framework must be conversational, i.e., an engineer must be able to check facts and procedures quickly, (ii) the framework must be trustworthy, namely, it cannot 'hallucinate'.
We propose to formulate KG-enhanced LLMs that could serve for training, inference, and interpretability. LLMs are well-known for knowledge acquisition from large-scale systems and for achieving state-of-the-art performance on many natural language processing tasks. However, they can suffer from various issues, such as hallucinations, false references, and made-up facts. On the other hand, KGs can store enormous amounts of facts in a structured and explicit manner. However, unlike LLMs, formulating KGs is a laborious process, and querying KGs might be computationally demanding. One interesting research question is then the following: How to combine KGs and LLMs such that LLMs provide answers based on facts and do not hallucinate in any way? This could serve as a starting point for this Ph.D. project.