Project description: Evolutionary changes in Distributed Analysis
Distributed server clusters are often used effectively to perform data analysis on voluminous collections of data. These clusters substantially speed up large-scale data analysis, by dividing data collections among available machines, where they can be processed in parallel. For instance, the distributed data processing platform Spark has become a de-facto standard in the world of large-scale data processing. The data processing pipelines for such platforms are composed during design time and then submitted to the central “master” component who then distributes the code among several worker nodes.
In many practical situations, the analysis application is not static and evolves over time: the developers add new processing steps, data scientists adjust parameters of their algorithm, and quality assurance discovers new bugs. Currently, an update of a pipeline looks as follows: the developers patch their code, re-submit the updated version, and finally restart the entire pipeline. However, restarting a processing pipeline safely is difficult: the intermediate state is lost and needs to be re-computed; some data need to be reprocessed and, finally, the cost of restarting may not be trivial - especially for real-time streaming components that require 24x7 availability.
In this project, we develop a platform to support evolving data-intensive applications without the need for restarting them when the requirements change (e.g. new data sources or algorithms become available). We apply our developed tools and techniques and evaluate their effectiveness in the context of three different industrial use cases from three top sectors: water treatment, life sciences, and HTSM/Smart Industry.
University of Groningen
The candidates for the PhD position should have a master degree in computer science or related fields, with a strong background in formal methods, service-oriented computing, software engineering, concurrency and distributed systems, and especially practical software tool development. Furthermore, the candidate should have at least some experience in the field of machine learning and/or data analysis / statistics.
Conditions of employment
The University of Groningen offers, in accordance with the Collective Labour Agreement for Dutch Universities, a salary of € 2,266 gross per month in the first year, up to a maximum of € 2,897 gross per month in the fourth and final year, based on a full-time position (1.0 FTE), excluding a holiday allowance of 8% gross annual income and a 8.3% end-of-the-year allowance. The position is limited to a period of 4 years. A PhD training programme is part of the agreement and the successful candidate will be enrolled in the Graduate School of Science and Engineering.
The successful candidate will first be offered a temporary position of one year with the option of renewal for another three years. Prolongation of the contract is contingent on sufficient progress in the first year to indicate that a successful completion of the PhD thesis within the next three years is to be expected.
The applications can be submitted until 6 January 23:59 h / before 7 January 2019 Dutch local time by means of the application form (click on "Apply" below on the advertisement on the university website).
Application should include:
• letter of motivation
• CV (including contact information for at least two academic references)
• transcripts from your bachelor’s and master’s degree.
Unsolicited marketing is not appreciated.
Faculty of Science and Engineering
Founded in 1614, the University of Groningen enjoys an international reputation as a dynamic and innovative institution of higher education offering high-quality teaching and research. Flexible study programmes and academic career opportunities in a wide variety of disciplines encourage the 30,000 students and researchers alike to develop their own individual talents. As one of the best research universities in Europe, the University of Groningen has joined forces with other top universities and networks worldwide to become a truly global centre of knowledge.
Within the Distributed Systems group of the Faculty of Science and Engineering of the University of Groningen in the Netherlands, a 4-years PhD position is available. This PhD candidate will work in the project Evolutionary changes in Distributed Analysis (ECiDA). This project involves development of dynamic data analysis pipelines on distributed data clusters.
The Distributed Systems group performs fundamental research and delivers education at the frontiers of dynamic complex distributed systems using formal engineering tools and seek applications with societal impact. Over the last decade the main research interests covered the areas of AI planning and discrete optimization in highly distributed environments, Internet-of-Things, building automation, large-scale data analytics, business process management, and energy distributed infrastructures as main application domains. The research results have been field-tested in collaboration with industry, one of such applications eventually led to the founding of the Sustainable Buildings company that applies the optimization algorithms in practice.
Prof. A. Lazovik