under the reserve that funds are granted - part-time employment may be possible
The Database Systems and Information Management (DIMA) Group is currently seeking to hire a Research Associate (PhD Student) to conduct R&D in the Optimization of Data Science Processes, under a joint initiative between DIMA and the Max Delbrück Center for Molecular Medicine (MDC) and the auspices of the HEIBRIDS (Helmholtz Einstein International Berlin Research School in Data Science) PhD program. Applying data science methods typically involves a tedious, iterative process of specifying and executing complex data analysis pipelines. These pipelines are comprised of preprocessing steps, model building, and performance evaluation. Heterogeneous data sources and systems for pipeline execution often introduce complex dependencies on input data and processing infrastructure. In order to simplify and accelerate this tedious data analysis process, it would be highly beneficial to enable the declarative specification of such pipelines. To overcome these deficiencies and challenges, we propose the following research directions: (a) the design of a holistic declarative specification for data science pipelines, which addresses the aforementioned requirements of declarativity, support for different execution environments, automatic data validation and recording of metadata, (b) the implementation of a system for the optimized execution of pipelines expressed in a declarative specification with support for different runtimes (e.g., translation to a mixed Spark/TensorFlow workload with experiment tracking enabled, translation to transactions inside a database with a machine learning extension), and (c) the utilization of an experiment database to automatically recommend tests for potential data errors in the pipeline (e.g., wrong data types, missing normalization of the data).
Successfully completed university degree (Master, Diplom or equivalent) in Data Management, Distributed Systems, or Scalable Data Analysis. Applicants should be strongly motivated to work in a leading research area, develop systems, and conduct research in a real-world application setting. Ideally, applicants should possess knowledge in programming languages and compilers, data science and machine learning, the design of programming languages, and distributed programming. Applicants experienced in big data analytics systems (e.g., Apache Flink, Hadoop or Spark), open source development, and implementing parallel database solutions will be looked upon favorably. Furthermore, good knowledge of both German and English are strongly desired. However, at a minimum, candidates must be able to communicate in English.
How to apply:
Please send your written application with the reference number and the usual documents (in particular cover letter, CV, academic transcripts, and reference letters) to Technische Universität Berlin - Der Präsident - Fakultät IV, Institut für Softwaretechnik und Theoretische Informatik, FG Datenbanksysteme und Informationsmanagement (DIMA), Prof. Dr. Markl, Sekr. E-N 7, Einsteinufer 17, 10587 Berlin or by e-mail to email@example.com.
To ensure equal opportunities between women and men, applications by women with the required qualifications are explicitly desired. Qualified individuals with disabilities will be favored. The TU Berlin values the diversity of its members and is committed to the goals of equal opportunities.
Please send copies only. Original documents will not be returned.
Technische Universität Berlin - Der Präsident - Fakultät IV, Institut für Softwaretechnik und Theoretische Informatik, FG Datenbanksysteme und Informationsmanagement (DIMA), Prof. Dr. Markl, Sekr. E-N 7, Einsteinufer 17, 10587 Berlin