Of­fer 86 out of 344 from 12/04/19, 10:45


Tech­ni­sche Uni­ver­sität Ber­lin - Fac­ulty IV - Insti­tute of Soft­ware Engin­eer­ing and The­or­et­ical Com­puter Sci­ence / Data­base Sys­tems and Inform­a­tion

Rese­arch Assist­ant - salary grade E13 TV-L Ber­liner Hoch­schu­len

under the reserve that funds are gran­ted - part-time employ­ment may be pos­sible

Work­ing field:

The Data­base Sys­tems and Inform­a­tion Man­age­ment (DIMA) Group is cur­rently seek­ing to hire a Rese­arch Asso­ci­ate (PhD Stu­dent) to con­duct R&D in the Optim­iz­a­tion of Data Sci­ence Pro­ces­ses, under a joint ini­ti­at­ive bet­ween DIMA and the Max Del­brück Cen­ter for Molecu­lar Medi­cine (MDC) and the aus­pices of the HEI­BRIDS (Helm­holtz Ein­stein Inter­na­tio­nal Ber­lin Rese­arch School in Data Sci­ence) PhD pro­gram. Apply­ing data sci­ence meth­ods typ­ic­ally invol­ves a tedi­ous, iter­at­ive pro­cess of spe­cify­ing and exe­cut­ing com­plex data ana­lysis pipe­lines. These pipe­lines are com­pri­sed of pre­pro­ces­sing steps, model build­ing, and per­form­ance eval­u­ation. Het­ero­gen­eous data sources and sys­tems for pipe­line exe­cu­tion often intro­duce com­plex depend­en­cies on input data and pro­ces­sing infra­struc­ture. In order to sim­plify and accel­er­ate this tedi­ous data ana­lysis pro­cess, it would be highly bene­fi­cial to enable the declar­at­ive spe­cific­a­tion of such pipe­lines. To over­come these defi­cien­cies and chal­len­ges, we pro­pose the fol­low­ing rese­arch dir­ec­tions: (a) the design of a hol­istic declar­at­ive spe­cific­a­tion for data sci­ence pipe­lines, which add­res­ses the afore­men­tio­ned require­ments of declar­ativ­ity, sup­port for dif­fer­ent exe­cu­tion envir­on­ments, auto­matic data val­id­a­tion and record­ing of meta­data, (b) the imple­ment­a­tion of a sys­tem for the optim­ized exe­cu­tion of pipe­lines exp­res­sed in a declar­at­ive spe­cific­a­tion with sup­port for dif­fer­ent run­ti­mes (e.g., trans­la­tion to a mixed Spark/Ten­sor­Flow work­load with exper­i­ment track­ing enab­led, trans­la­tion to trans­ac­tions inside a data­base with a machine learn­ing exten­sion), and (c) the util­iz­a­tion of an exper­i­ment data­base to auto­mat­ic­ally recom­mend tests for poten­tial data errors in the pipe­line (e.g., wrong data types, miss­ing nor­mal­iz­a­tion of the data).


Suc­cess­fully com­ple­ted uni­ver­sity degree (Mas­ter, Dip­lom or equi­val­ent) in Data Man­age­ment, Dis­trib­uted Sys­tems, or Scal­able Data Ana­lysis. App­lic­ants should be stron­gly motiv­ated to work in a lead­ing rese­arch area, deve­lop sys­tems, and con­duct rese­arch in a real-world app­lic­a­tion set­ting. Ide­ally, app­lic­ants should pos­sess know­ledge in pro­gram­ming lan­guages and com­pilers, data sci­ence and machine learn­ing, the design of pro­gram­ming lan­guages, and dis­trib­uted pro­gram­ming. App­lic­ants exper­i­enced in big data ana­lyt­ics sys­tems (e.g., Apa­che Flink, Hadoop or Spark), open source devel­op­ment, and imple­ment­ing par­al­lel data­base solu­tions will be loo­ked upon favor­ably. Fur­ther­more, good know­ledge of both Ger­man and Eng­lish are stron­gly desi­red. Howe­ver, at a min­imum, can­did­ates must be able to com­mu­nic­ate in Eng­lish.

How to ap­ply:

Please send your writ­ten applic­a­tion with the ref­er­ence num­ber and the usual doc­u­ments (in par­tic­u­lar cover letter, CV, academic transcripts, and reference letters) to Tech­nis­che Uni­versität Ber­lin - Der Präsid­ent - Fakultät IV, Institut für Softwaretechnik und Theoretische Informatik, FG Datenbanksysteme und Informationsmanagement (DIMA), Prof. Dr. Markl, Sekr. E-N 7, Einsteinufer 17, 10587 Berlin or by e-mail to

To ensure equal opportunities between women and men, applications by women with the required qualifications are explicitly desired. Qualified individuals with disabilities will be favored. The TU Berlin values the diversity of its members and is committed to the goals of equal opportunities.

Please send copies only. Original documents will not be returned.