2018-00562 – VAlorisation des DOnnées pour la Recherche d’Emploi

Apply for this position

2018-00562 – VAlorisation des DOnnées pour la Recherche d’Emploi

  • About the research centers
  • Background
  • Mission
  • Main activities
  • Requirements
  • Net Salary
  • General information

Apply for this position

Date limite de candidature
Closing date for submitting applications : 15.09.18
Corps de texte

2018-00562 – VAlorisation des DOnnées pour la Recherche d’Emploi

Qualification required : Master Degree or equivalent
Position : PhD Student

About the research centers
Corps de texte

The thesis will be carried out at two centers : CREST and LRI. It will be co-supervised by a research director of each center.

CREST (Research Center in Economics and Statistics) is a joint research center that brings together professors from ENSAE, ENSAI and the Economics Department of Ecole Polytechnique. It is an UMR (Mixed Research Unit) of the institution CNRS. CREST has an interdisciplinary perspective, which is reflected in its four disciplines : economics, statistics, finance-insurance and sociology.

CREST's common culture is characterized by a strong attachment to quantitative methods, data, mathematical modeling and by continuous back-and-forth moves between theoretical models and empirical evidence in order to analyze practical, social and economic problems. 

In addition to the scientific works of its members, an important challenge for the center is to promote its scientific culture and expertise through doctoral training. The center therefore actively participates in several master degree programs, in particular in Paris-Saclay Master Degree in Economics and its phD program as well as in the Master degree program in Data Sciences.

  • Institution : ENSAE, Bruno Crépon

  • Laboratory: CREST

  • Team: Economics, Econometrics, Statistical Learning

  • Researchers involved: B. Crépon, C. Gaillac, M. Cuturi

LRI (Research Laboratory in Computer Science) is a joint research unit (UMR8623) of Paris-Sud University and CNRS. Along with the unit LIMSI-CNRS, LRI is part of the Computer Science Department of the University Paris-Sud and part of the INS2I (sections 06 main and 07 secondary) of CNRS. The unit reports to the Delegation Île-de-France Sud of Gif-sur-Yvette.

Created more than 35 years ago, it has more than 250 people including about 133 permanent employees and 90 doctoral students. LRI is organized into nine research teams, one administrative team and one technical team. Four of the research teams (including the AO team that will welcome the doctoral student) are wholly or partly in common with Inria Saclay - Île-de-France, which is the main partner of the laboratory. The laboratory is located at the Plateau du Moulon, in the new premises Ada Lovelace of PCRI (buildings shared with Inria) since June 2011 and in Claude Shannon at Digitéo (offices shared with teams of Inria, IEF and CEA) since early 2013.

The research themes of the laboratory cover many areas of computer science that are focused on software programming. The themes include both fundamental and applied aspects: algorithmic, combinatorics, graphs, discrete and continuous optimization, programming, software engineering, verification and proofing, parallelism, high-performance computing, grids, architecture and compilation, networks, databases, representation and processing of knowledge, learning, data mining, bioinformatics, human-machine interaction, etc. This diversity is one of the strengths of the laboratory, which supports borderline research where the greatest potential for innovation can be found.

  • Organization: Paris-Sud University Michele Sebag
  • Laboratory: LRI, CNRS UMR 8623, INRIA

  • Team: TAO, Learning and Optimization

  • Researchers involved: Mr. Sebag, P. Caillou, P. Tubaro
Corps de texte

The thesis subject is part of a project selected by DATAIA Institute through its call for projects 2018. 
The objective of the project is to develop and test tools that will improve the matchings of job seekers/companies in the labor market. The project is carried out in close collaboration with Pôle Emploi (French National job center) and will rely on the very rich data on both jobseekers and companies used by Pôle Emploi. A PhD student hired under a CIFRE contract at Pole Emploi is also involved in the project. In addition to the methodological aspect that is at the heart of the proposed subject, the overall project has an important operational dimension. It plans to use the latest tools available in the machine learning literature (word embeddings, optimal transport, deep learning ) and to rigorously test their effectiveness and impact on the labor market balance.

Corps de texte

Thesis Subject : Part of the overall project selected by DATAIA Institute, the thesis subject is the development of tools that will make recommendations. An important aspect of the project is to be able to recommend offers/companies to jobseekers and conversely to suggest CVs of jobseekers to companies.

State of the art: Two major approaches in data science, which are likely to help the search for a job or for an employee, were considered. 

The first approach is the search for information: given a request (job offer) what is the document (CV) that best meets the request. The possible solutions (Faliagka et al., 2012, Singh et al., 2010) are based on natural language processing and have to face the issue of the different languages used ​​in the documents (job seekers) and the requests (job offers), among many other things. One solution is to define standard skills or ontologies, and ideally to constitute a common language between offers and CVs. However, this list of skills only partly solves the problem, for the three following reasons: i) the quick changes in job positions; ii) the noise in the description of applicants and recruiters; iii) the need to take into account other semi-structured information (particularly geographical ones) that can be more on less significant, depending on jobs and people.

A second approach is a collaborative filtering, the best known example being the Netflix platform. Classic collaborative filtering has users’usage-traces only, indicating "who likes what". Usage traces are used to infer a latent description of users and items. In collaborative terms, the main difficulty regarding the employment problem is that it is a "cold start": the users and items under study are identified as new (specifically the job ads that are not permanent items). In 2017, the international challenge of the community Systèmes de Recommandation, RecSys 2017, was dedicated to recommending job offers to jobseekers based on data collected on the Xing.com platform (1.5 million users, 1.3 million offers). Among the approaches developed in collaborative filtering for this challenge, we can mention the work of Volkovs et al., 2017 and the work done at LRI (Schmitt et al., 2016, Schmitt et al., 2017).

Aim of the thesis: From the data that would have been grouped and formatted, we will create a recommendation system, capable of identifying, listing and arranging job offers that could lead a jobseeker to be hired and conversely relevant CVs would be suggested for a given job offer or company.

The thesis will first focus on the application of simple recommendation models - such as co-embedding filtering based on the history of matchings and proximity models between job seekers, offers and companies- and the use intuitive techniques (if a job seeker shares many features with another one, and that job seeker has been hired by a company for a particular offer; what other company or other offer, similar to the latter, could be suggested?).

In this context, it will be essential to learn from "embeddings" that can effectively encode these proximities of skills. Embeddings will first be learned by using the very rich available data of Pôle Emploi on jobseekers and companies, especially the textual data contained in the CVs and job ads, which is represented in the form of vectors via word embeddings. 

This will make it possible to learn a linear transformation (even neuronal multi-layers one) correlated with statistics of co-occurrence (the matches observed between companies and job-seekers) for jobseekers, as well as offers and companies.

Keywords: word embedding, collaborative filtering, deep learning, job market

Main activities
Corps de texte

The main activity within the overall project is the development of prediction algorithms.

Corps de texte

We are looking for candidates driven by stimulating research and real-world data application. The candidate must have a strong knowledge in mathematics and probabilities / statistics. Desired programming skills : preferably in Python or R languages (or other scripted language) 

Net Salary
Corps de texte

1700 euros per month

General information
Corps de texte
  • Topic / Domain: Learning and statistical methods, Statistics (Big data) (BAP E)
Location : Plateau de Saclay (91)

  • Starting date : 2018-10-01

  • Duration of contract: 3 years

  • Deadline to apply: 2018-09-15. 

We encourage candidates that are interested in the offer, to contact us as soon as possible.

To apply, fill out the form below or send your application to crepon@ensae.fr and michele.sebag@lri.fr


Recognition of disabled worker
Upload requirements
Upload requirements
Upload requirements
1 + 3 =
Solve this simple math problem and enter the result. E.g. for 1+3, enter 4.
For any question regarding our protection and security of personal data policy, please check legal notice page.