Offres de stage

Ces stages sont offerts préférentiellement aux étudiants de l’UFR de linguistique, mais les étudiants inscrits dans d’autres établissement peuvent toutefois postuler.

Stages 2025-2026 / Internships 2025-2026

NR = Non Rémunéré ; GR = Gratification Réglementaire

LLF/INCC, UPC (G. Turco), Master 2: perception chez les nourissons

Stage au LLF et à l’INCC (Babylab) proposé par Giuseppina Turco

Le LLF cherche un·e stagiaire de Master 2 pour un stage rémunéré de 6 mois au BabyLab (INCC), de janvier à juin 2026

Thématique du stage : Le projet porte sur la perception chez les nourrissons. La/le stagiaire participera à différentes étapes du processus de recherche.

Recrutement des participants (bébés et leurs parents)
Participation à la conception et à l’adaptation du design expérimental
Recueil de données expérimentales (enregistrement, participation au codage)

Pour postuler : Merci d’envoyer un CV (1 page) à giuseppina.turco@cnrs.fr avant le 15 octobre 2025.

Laboratoire de linguistique formelle, UPC (A. Abeillé), Master 1-2: locality constraints

Internship at LLF proposed by Anne Abeillé

The Laboratoire de linguistique formelle offers an internship for a 1st or 2nd year master student.

Topic : Locality constraints

Description: Long distance dependencies, exhibited in topicalization, relative clauses or wh-questions, are a hall mark of natural language syntax. However the nature and scope of their constraints are not fully understood, and cross-linguistics quantitative (corpus and/or experiments) are needed (Sprouse et al 2016, Abeillé et al 2020, Winckel et al 2024). The internship will focus on object/adjunct, or NP/PP, assymetries. The language can be French, English or another one, depending on the student’s expertise. This project is part of a larger collaboration between LLF and MIT (Ted Gibson).

Duration: at least 2 months.

Compensation (“gratification de stage”): yes.

Prerequisites: some knowledge of syntax is required.

To apply : contact Anne Abeillé at anne.abeille@u-paris.fr

Laboratoire de linguistique formelle, UPC (A. Abeillé), Master 1-2: agreement

Internship at LLF proposed by Anne Abeillé

The Laboratoire de linguistique formelle offers an internship for a 1st or 2nd year master student.

Topic :Closest conjunct agreement

Description: The full range of (syntactic, semantic and processing) factors that influence (gender) agreement and how different agreement strategies compete among a single language are still open questions. In case of nominal coordination, agreement with the closest noun (CCA) is attested in various languages (Romance, South Slavic, Bantu, Arabic) although rarely mentioned (or stigmatized) in grammar books. Quantitative data (corpora and experiments) are needed to advance our knowledge. The choice of language will depend on the student’s expertise.

Duration: at least 2 months.

Compensation (“gratification de stage”): yes.

Prerequisites: Proficiency in a language with grammatical gender is required

To apply : contact Anne Abeillé at anne.abeille@u-paris.fr

Laboratoire de linguistique formelle, UPC (A. Abeillé), Licence/Master: Spanish nouns

Internship at LLF proposed by Anne Abeillé

The Laboratoire de linguistique formelle offers an internship for a Bachelor, or a 1st or 2nd year master student.

Topic : Spanish human nouns

Description: Human nouns usually come in gendered pairs (tio, tia ‘uncle,aunt’) : while the feminine form refers to women, it is not clear whether the masculine refers to men or to both. The purpose of the internship is to search for predicative human nouns in Spanish, using (annotated) corpora (CREA, COREC) and run acceptability judgements, parallel to those that have been run on French (Paul est un étudiant, Marie aussi. Marie est une étudiante, Paul aussi.) (Kious & Abeillé 2025). The aim is to explore whether Spansih shows gender assymetries as in English (John is an actor, Mary too. # Mary is an actress, John too.) or not, like in French.

Duration: at least 2 months.

Compensation (“gratification de stage”): yes.

Prerequisites: Proficiency in Spanish is required

To apply : contact Anne Abeillé at anne.abeille@u-paris.fr

Laboratoire de linguistique formelle, UPC (C. Donati), Master: sign language acquisition

Internship at LLF proposed by Caterina Donati

The Laboratoire de linguistique formelle, (in collaboration with University of Milan Bicocca, offers an internship for a master student.

Topic : The acquisition of sign languages: a scoping review

Aim: Selecting the literature describing comprehension and production morphosyntactic abilities development in L1 and L2 signing children aged 0-12 (both typical and atypical development).

Task: Help the team by contributing to the selection of papers that are relevant for the scoping reviews by reading absyract and methodology of some allotted articles and deciding whether to exclusde them or not on the basis of some given criteria

Required competences: some kowledge of language acqusition and/or sign language are needed, together with good (passive) English knowledge.

Skills developed: you learn how to conduct a scoping literature review, use the relevant platform (Rayyan) and tools, and you learn how to extract crucial methodlogical features form resercah articles

Duration: 1 month.

Compensation (“gratification de stage”): no.

To apply : contact Caterina Donati

Laboratoire de linguistique formelle, UPC (C. Donati), Master: complexity in adolescents’ production

Internship at LLF proposed by Caterina Donati

The Laboratoire de linguistique formelle, offers an internship for an M1 or M2 master student.

Topic : Measuring language complexity in adolescents’ production in formal and informal registers (American English and Greek)

Aim: Measure how distant is complexity in the production of adolescents when using informal and formal registers

Task: analyse using semi-automatized indexes of complexity some corpora of adolescents’ production in formal and informal registers

Required competences: basic linguistic knowledge, plus native competence of either American English or Greek.

Skills developed: you learn how to use classical indexes of complewxity in various someains (MLU, type/toben ration/senetnce/clause ration etc.) for analysing (oral) texts in a semi-automated fashion.

Duration: one month

Start: January or later

Remuneration: yes

Level: M1 or M2

To apply : contact Caterina Donati

Laboratoire de linguistique formelle, UPC (C. Donati), Licence 3/Master: resumption in French relative clauses

Internship at LLF proposed by Caterina Donati

The Laboratoire de linguistique formelle, offers an internship for an L3, M1 or M2 master student.

Topic : Resumptive relative clauses in informal French: an experiment

Aim: Study the processing of relative clauses involving a resumptive pronoun in informal French

Task: prepare the material for a self paced listening experiment, bu-y constructing ecologically valid and natural substandard relative clauses

Duration: one month

Start: January or later

Remuneration: yes

Duration: one month

Level: L3, M1 or M2

To apply : contact Caterina Donati

LISN (Paris Saclay), Philippe Boula de Mareüil, Master: Classifications de dialectes romans et de créoles à partir de différents types de distances

Stage au Laboratoire Interdisciplinaire des Sciences du Numérique (LISN, U. Paris Saclay) proposé par Philippe Boula de Mareüil

Sujet: Classifications de dialectes romans et de créoles à partir de différents types de distances
Niveau : M1et M2
Compétences requises : le ou la stagiaire devra manipuler différents outils de classification (clustering hiérarchique agglomératif, scaling multidimensionnel, etc.), ainsi que des outils de traitement automatique de la langue écrite et de la parole. La connaissance de dialectes romans et de créoles n’est évidemment pas obligatoire, mais un intérêt pour les langues peu dotées et ce que peuvent apporter les nouvelles technologies constituera un plus.
Rémunération : 700 euros/mois
Contact : Philippe Boula de Mareüil <mareuil@lisn.fr>, LISN, Université Paris-Saclay
Durée, dates : 4 à 6 mois à partir de février 2026
Description détaillée : Il s’agit d’appliquer différents types de distances pour classifier les dialectes romans (de France ou d’Europe occidentale) et les créoles (à base lexicale française ou anglaise) :
- des distances « typologiques », suivant les principes de la glottométrie historique, qui demande d’investir des connaissances linguistiques en phonétique/phonologie, morphologie et syntaxe ;
- des distances d’édition, de type Levenshtein, à partir de traductions de textes parallèles (sur la base des mots, des graphèmes ou des phonèmes, utilisant éventuellement des convertisseurs graphème-phonème) ;
- des distances acoustiques, mesurées entre vecteurs de coefficients cepstraux ou des plongements issus de modèles auto-supervisés préentraînés (Wav2vec).
Selon les profils des candidats, on pourra se concentrer sur l’un ou l’autre des types de distance, sur tels ou tels dialectes ou créoles.

Inria (C. Clavel), Licence/Master: Multimodal annotation

Internship at Inria proposed by Chloé Clavel

Inrial offers an internship for a Bachelor, or a 1st year master student.

Topic : Multimodal corpus annotation

Tasks:

Annotating a corpus for multimodal analysis of collaboration quality through audio and video recordings (automatic transcription provided) of human-human interaction. The corpus contains around 100 hours of video, but only a portion of it will be annotated.
Two dimensions for quality need to be annotated using multimodal conversation data: individual-level(Identify how individuals benefit from the collaboration process), and group-level (describe the group’s general capacity across different tasks )
Ensuring annotation consistency by following predefined guidelines and collaborating with the team for clarification and updates.
Annotations will be based on multimodal cues, including speech, facial expressions, body language, and dialogue context.
Participate in the design of annotation guidelines by providing feedback

Required Competences:

Good attention to detail and ability to follow annotation protocols.
The corpus is in English: a good level of English is needed
Basic knowledge of linguistics and conversational analysis will be helpful.
Experiences with data annotation tools (e.g., ELAN) are a plus.
Preferably have a background in linguistics, cognitive science, or computer science with an interest in human-human, human-agent interaction.

Duration: 4 months, starting in February/March 2026.

Compensation (“gratification de stage”): yes.

To apply : contact Chloe Clavel chloe.clavel@inria.fr

Laboratoire de linguistique formelle, UPC (A. Abeillé), Licence/Master: Spoken French

Stage au LLF proposé par Anne Abeillé et Heather Burnett

Le Laboratoire de linguistique formelle propose un stage pour un·e étudiant·e de L3, M1 ou M2.

Thème : La variation syntaxique en français parlé au XXIe siècle

Description: Il s’agit d’extraire de grands corpus oraux (CFPP, MPF) des énoncés condamnés par les discours normatifs mais fréquents à l’oral (verbes impersonnels sans « il », négation sans « ne », subordonnées sans « que », etc.). L’ annotation et l’étude des facteurs favorisant les variantes normées ou non permettra de formuler des hypothèses sur leur évolution.
Ce stage s’inscrit dans le cadre d’une coopération scientifique avec l’Université de Montréal visant à comparer le français parlé à Paris et celui parlé à Montréal.
Une journée de travail est prévue à Paris en mai.

Durée: 3 à 5 mois, temps partiel possible

Gratification de stage: oui.

Compétences requises: grammaire du français écrit et parlé, outils de recherche dans les corpus

Pour candidater : contacter Anne Abeillé à l’adresse anne.abeille@u-paris.fr

Offres de stages passées / Past offers of internships

Laboratoire de Linguistique Formelle, UPC (Lisa Brunetti), Licence et Master: expériences de production

Passation d’expérience de production.

Le ou la stagiaire contribuera au recrutement de participant.es pour une expérience de production et à la passation de l’expérience dans une chambre sourde (les participant.es devront lire des phrases présentées sur ordinateur ou tablette). L’expérience portera sur le français. Acune connaissance préalable est requise, sauf savoir communiquer en français. La tutrice expliquera au/à la stagiaire le but de l’expérience, et il/elle se familiarisera avec les différents aspects pratiques et conceptuels de ce type de recueil de donnée.

Niveau: L3, M1, M2
Début du stage: fin 2025 ou début Janvier 2026 (avant la reprise des classes)
Préréquis: savoir communiquer en français
Durée: 70 heures (deux semaines à temps plein, mais possibilité de faire un stage d’une durée plus longue et à temps partiel)
Non rémunéré
Contact: Lisa Brunetti, lisa.brunetti@u-paris.fr

Laboratoire de Linguistique Formelle, UPC (Anne Abeillé), Licence et Master

Stages au LLF (UPC) Anne Abeillé

Stage 1 : L’accord du participe passé en français: une approche quantitative

Début souhaité en novembre 2024

M1,M2, GR

L’accord du participe passé avec avoir donne toujours des sueurs froides aux écoliers. Des études récentes au LLF ont montré que les modèles de langage de type BERT étaient sensibles aux dépendances à distance gouvernant cet accord, alors que les jugements d’acceptabilité humains ne montrent pas la même sensibilité.

Des études quantitatives sur les réseaux sociaux montrent un taux d’accord très faible (Benzitoun & Flesch 2024). Nous voudrions tester, sur ce même corpus, quels facteurs (objet pronom, 3e personne, humain etc) (dé)favorisent l’accord du participe avec avoir. Des expériences controlées (tâche de lecture et de jugement d’acceptabilité) pourront contrôler les mêmes facteurs.

Encadrement et labo : Anne Abeillé, Marie Flesch, Barbara Hemforth (LLF)

Benzitoun & Flesch 2024: https://www.shs-conferences.org/articles/shsconf/abs/2024/11/shsconf_cmlf2024_14004/shsconf_cmlf2024_14004.html

Stage 2: Le purisme grammatical au XXIe siècle- Verbal hygiene in 21st century French

M1,M2, GR

Début souhaité janvier 2025

Le français a une tradition ancienne de grammaires prescriptives, mais personne n’est d’accord sur ce qui constitue une ‘faute’ de grammaire.

Internet a vu un boom sans précédent de purisme grammatical, de la part de spécialistes comme de non spécialistes. Il s’agira de constituer un corpus de messages et de recommendations,

équilibré depuis 2000, entre experts et non experts, puristes et antipuristes, à partir de sites comme ceux de l’Académie française (Dire, ne pas dire) et du Figaro mais aussi de Twitter et des réseaux sociaux.

Encadrement et labo : Anne Abeillé, Heather Burnett (LLF)

Stage 3: L’accord de proximité depuis le 17e siècle: Closest Conjunct agreement since the 17th century

ENGLISH. In French, closest conjunct agreement is quasi compulsory before plural nouns (certaines villes et villages) (Abeillé et al 2018) but feminine agreement after the same nouns (des chants et des danses bretonnes) has dropped sharply, especially for predicate adjectives (Abeillé et al 2022). We want to test singular coordinated nouns, on which 17th century grammarians disagreed, Vaugelas recommending CCA for attributive adjectives. The intern will search and annotate large corpora, such as Frantext. The language can vary depending on the intern’s language the available corpora.

L’accord de proximité est toujours quasi obligatoire devant les noms pluriels (certaines villes et villages) (Abeillé et al 2018) mais l’accord au féminin après ces mêmes noms a chuté (des chants et des danses bretonnes), en particulier pour l’attribut (Abeillé et al 2022). Nous voudrions tester l’accord en cas de coordination de deux noms singuliers de genre différents, pour lesquels les grammairiens du 17e siècle étaient partagés, Vaugelas recommandant l’accord de proximité pour l’épithète. Il s’agira d’exploiter les corpus annotés disponibles, en particulier Frantext. La langue étudiée pourra être fixée selon les compétences de l’étudiant.e, et les corpus disponibles.

Type, durée, rémunération : L3,M1,M2, GR

Encadrement et labo : Anne Abeillé (LLF)

Abeillé et al 2018 https://journals.openedition.org/discours/9542

Abeillé et al 2022 https://journals.openedition.org/discours/12363

Stage 4. Phrases elliptiques et marques de genre: une approche expérimentale / Ellipsis and gender : an experimental approach

Les phrases sans verbe ont souvent un équivalent verbal comme ‘Paul aime les pommes et ses enfants (aiment) les bananes.’ De nombreuses langues présentent des cas de mismatch, où la forme manquante n’est pas la même que celle de l’antécédent (ici aime/aiment). Le but du stage sera de tester des cas de mismatch de genre (Paul est plus grand que Marie.) et d’omission de la préposition (Paul rêve d’habiter à Paris et Marie Venise.), en montant des expériences de jugement d’acceptabilité. Il est possible de travailler sur une autre langue à genre grammatical.

ENGLISH. Verbless sentences tend to have verbal counterparts as in: Paul aime les pommes et ses enfants (aiment) les bananes.(‘Paul likes apples and his children bananas’). ‘Mismatch’ cases abound in many languages, where the missing form is not the same as the antecedent one (here the verb aime/aiment). The intern will run design and run acceptability judgement experiments on gender mismatch (Paul est plus grand que Marie. ‘Paul is taller.msg than Mary.’) and preposition omission (Paul rêve d’habiter à Paris et Marie Venise. ‘Paul dreams about living in Paris and Marie Venice’). Working on another gender marking language is possible too.

Type, durée, rémunération : L3,M1,M2, GR

Encadrement et labo : Anne Abeillé (LLF) & Emma Kious (LLF)

UFR d'Etudes Anglophones, UPC (Emmanuel Ferragne) - Master

Dans le cadre d’un projet pédagogique à l’UFR d’Etudes Anglophones, nous proposons un stage à un étudiant/une étudiante en master d’informatique/linguistique informatique à Université Paris Cité d’octobre à décembre 2024.
Il s’agira d’entraîner des modèles de reconnaissance d’accent (par ex. https://github.com/JuanPZuluaga/accent-recog-slt2022), de les adapter à nos besoins et de décrire dans un document les différentes étapes impliquées dans ces tâches de sorte qu’on puisse les reproduire.
Ce stage donne lieu à une gratification financière.
Le lieu du stage est : bâtiment Olympe de Gouges, 8 place Paul Ricoeur, dans le 13e.
Merci de diffuser cette annonce.
Merci aux personnes intéressées de me contacter : emmanuel.ferragne@u-paris.fr

Laboratoire de Linguistique Formelle, UPC (Anne Abeillé) - Master

Stage 5 Quantification at a distance: an empirical study

The purpose is to find out which factors favor quantification at a distance, with large annotated corpora and controlled experiments.

The language may be French or any other language displaying similar phenomena.

In French quantification at a distance has mainly been studied from a theoretical point of view, but the preferences are unknown.

Combien tu as d’enfants? (how many do you have children)

Combien d’enfants tu as (how many children do you have)

Combien de fois tu l’as vu (how many times you saw her?)

Combien l’as-tu vu de fois ? (how many you saw her times)

We will explore processing, semantic and context factors. Syntactic locality constraints may be at stake too.

M1, GR

Anne Abeillé (LLF)

Laboratoire de Linguistique Formelle, UPC (Jana Rameh), Licence et Master

Titre : L’écriture inclusive dans un corpus de publications d’entreprises

Niveau : L1, L2, L3, M1
Compétences requises : connaissances en analyse de corpus (méthodes et outils d’analyse textuelle). Une appétence pour l’écriture inclusive est appréciée.
Rémunération : gratification réglementaire (GR).
Contact et laboratoire : Jana Rameh, LLF (Laboratoire de Linguistique Formelle).
Durée, dates : Début souhaité dès que possible ; durée à déterminer.
Description : nous recherchons une personne autonome et rigoureuse pour participer au recueil et à l’analyse de corpus de publications d’entreprises (scraping + analyse). Elle contribuera à organiser, collecter et analyser les résultats en lien avec l’utilisation de l’écriture inclusive dans les communications professionnelles.

Title: Inclusive Writing in a Corporate Publications Corpus

Level: L1, L2, L3, M1

Required skills: Knowledge of corpus analysis (methods and tools for textual analysis). An interest in inclusive writing is a plus.
Compensation: Regulatory stipend (GR).
Contact and laboratory: Jana Rameh, LLF (Laboratoire de Linguistique Formelle).
Duration, dates: start as soon as possible ; duration to be determined.
Description: we are looking for an autonomous and detail-oriented individual to participate in the collection and analysis of corporate publication corpora (scraping + analysis). The role will involve organizing, collecting, and analyzing the data, with a focus on the use of inclusive writing in professional communications.

Laboratoire de Linguistique Formelle, UPC (Anne Abeillé), Master

M1 or M2, paid

French required

Title: Annotation of Human Repair Initiation in Task-oriented Dialogue

Project Description:

Human language complexities often expose flaws such as misunderstandings, misinterpretations, speech impediments, or social norm violations.

Strategies people use in conversations to identify and address these problems, fostering mutual understanding, are called repair (Schegloff, 2007). Schegloff (2007) distinguishes repair types based on who initiates and who provides the solution between the speaker and the addressee.

The overall aim of this research is to create a conversational agent able to handle social repairs from its human interlocutors. To such an aim, it needs to detect when a repair arises and to recognize the type of repairs. To model such a capability, we will rely on annotated data.

This internship aims to annotate a corpus of dyads in term of social repairs.

Tasks:

Annotating a corpus for multimodal analysis of conversational repair (Other-initiated Self-repair) through audio and video recordings (transcription provided) of human-human interaction.
Identifying and labelling each part of conversational repair sequences and classifying different types of repair initiation.
Ensuring annotation consistency by following predefined guidelines and collaborating with the team for clarification and updates.
Annotations will be based on multimodal cues, including speech, facial expressions, body language, and dialogue context.

Requirements:

Good attention to detail and ability to follow annotation protocols.
Basic knowledge of linguistics and conversational analysis will be helpful.
Experience with data annotation tools (e.g., ELAN, Praat, or similar) is a plus.
Preferably have a background in linguistics, cognitive science, or computer science with interest in human-human, human-agent interaction.

Applications: send CV and names of referent to anh.ngo-ha@inria.fr and chloe.clavel@inria.fr

Laboratoire de Linguistique Formelle, UPC (Patrick Caudal), L3, M1, M2

Trois types de stages sont offerts :

— des stages de niveau L3 non rémunérés, avec peu de compétences nécessaires

— des stages de niveau M1 et M2 rémunérés (via le Labex EFL)

Compétences requises :

L3 : connaissances théoriques élémentaires en morphologie, syntaxe, et si possible sémantique (une formation éclair de 2h sera dispensée pour la partie sémantique)

M1 et M2 : autant que possible, avoir suivi le cours « spoken corpora » de P. Caudal

Rémunération : NR (=non rémunéré) ou GR (= Gratification Réglementaire)

L3: NR

M1 et M2 : GR

Contact et laboratoire : LLF, Patrick Caudal, pcaudal@linguist.univ-paris-diderot.fr

Durée, dates : Entre avril et juin 2025 autant que possible

Description: Il s’agit de contribuer à un projet de typologie quantitative sur la forme et le sens des flexions dans les langues non-pama-nyungan. Il s’agit de valider l’hypothèse selon laquelle il existe un cycle morphologique dans ces langues, menant d’une forme périphrastique basée sur une série verbale (de type mouvement associé ou posture associée), à une flexion polysynthétique.

Chaque étudiant.e aura en charge l’extraction de données flexionnelles (exemples avec leur glose et leur traduction) depuis des sources grammaticales dans une langue précise, puis leur annotation selon une grille bien précise, pour la forme comme pour le sens ; les données annotées seront intégrées à une base de donnée dédiée. Les stages de M2 comporteront un composant d’évaluation d’autres annotations pour validation de la base de donnée.

Une dizaine de langues au minimum sont à couvrir ; les résultats serviront à développer le pilote d’un outil de comparaison typologique de la flexion. D’autres langues seront documentées et ajoutées à la base de donnée dans les années à venir, et le résultat sera le support empirique pour (i) une reconstruction historique étendue de la flexion dans les langues non-pama-nyungan et (ii) l’application de modèles phylogénétiques computationnels pour affiner notre compréhension globale de l’évolution des langues non-pama-nyungan.

Laboratoire de Linguistique Formelle, UPC (Ira Noveck), L3, M

M1 ou M2
Gratification de stage
Tuteur : M. Ira Noveck, DR CNRS (ira-andrew.noveck@cnrs.fr), Laboratoire de Linguistique Formelle, Université Paris Cité Durée : 4 mois temps 30H/mois. À commencer en février 2025
Descriptif : Dans le cadre d’une étude développementale sur les connecteurs discursifs (termes tels que mais et alors), nous recueillons des données auprès d’enfants et adultes au moyen de tablettes (commençons avec les adultes). Le travail consisterait à trouver des écoles et à se rendre dans les classes (ou de chercher les adultes individuellement) pour collecter des données. Si quelqu’un a accès à des écoles situées en dehors de Paris, ce serait avantageux. Niveau de français : Il faut se débrouiller (langue maternelle francaise est avantageuse).
Contacte: ira.noveck@u-paris.fr

Laboratoires LLF et HTL (UPC) et LACITO (Guillaume Wisniewski, Aimée Lahaussois et Séverine Guillaume), L3, M1

Offre de Stage : Développement d’un dictionnaire de verbe pour le thulung

(English below)

Stage gratifié.

Description du stage : Nous recherchons un(e) stagiaire pour participer au développement d’un dictionnaire pour le thulung, une langue tibéto-birmane de l’est du Népal. Ce projet se déroulera en deux étapes principales :
1. Extraction et structuration des données : Extraction des informations pertinentes à partir de fichiers ELAN recueillis sur le terrain, en intégrant des liens vers des données audio. Ces données seront stockées dans un format approprié (XML ou autre).
2. Génération automatique d’une page HTML : Cette page similaire à celle accessible ici permettra aux utilisateurs·trices de consulter le dictionnaire et d’effectuer des recherches.
Compétences requises :
● Programmation en Python pour le traitement des données linguistiques.
● Connaissance des formats XML et ELAN (ou motivation pour les apprendre).
● Compétences en développement web (HTML, CSS, XSL) pour la visualisation du dictionnaire.
Durée et Modalités :
● Stage de 1 mois, date de démarrage et organisation à déterminer avec le ou la candidat·e.
● HTL
Candidature : Envoyez votre CV et une lettre de motivation à aimee.lahaussois@cnrs.fr et severine.guillaume@cnrs.fr.

Internship Offer: Development of a Digital Verb Dictionary for Thulung
Internship Description: We are looking for an intern to contribute to the development of a verb dictionary for Thulung, a Tibeto-Burman language. This project consists of two main phases:
1. Data Extraction and Structuring: Extract relevant information from ELAN files collected in the field, incorporating links to audio data. These data will be stored in an appropriate format (XML or another suitable format).
2. Development of a Web Page: Design a web interface similar to the one that can be seen here allowing users to browse and search the dictionary.
Required Skills:
● Python programming for linguistic data processing.
● Knowledge of XML and ELAN formats (or willingness to learn).
● Web development skills (HTML, CSS, XSL) for dictionary visualization.
Duration and Conditions:
● Internship duration: 1 month, start date to be agreed with the candidate.
● Remote work possible.
Application: Send your CV and a cover letter to aimee.lahaussois@cnrs.fr et severine.guillaume@cnrs.fr.

Stages au Laboratoire de Linguistique Formelle (UPC)

Stages en sémantique et pragmatique sur des sujets tels que les quantificateurs, les structures distributives et proportionnelles, ou la structure argumentale des verbes. Il s’agira d’études théoriques, bibliographiques et sur corpus.

Type, durée, rémunération : L3 et M1, NR
Encadrement et labo : Lucia Tovena (LLF)

Stage au LLF (UPC)

Title: Transition parsing and Q-learning
Duration: at least 3 months
Supervisor: T. Bernard
Expected profile: Very good programming skills, interest in neural-based machine learning, Master 1 or Master 2 in computational linguistics or computer science

Syntactic transition parsers such as shift-reduce parsers and arc-eager parsers are straigthforward to implement and train in the standard (teacher forcing) supervised fashion (see, e.g., Chen and Manning 2014 and Dyer et al. 2015). Standard supervised training, however, aims at maximising the log-likelihood of the annotated (gold) structures of the training data while the quantity that matters most is the expected F1/attachement score (the actual performance of the parser). In addition, while beam-search decoding is the most standard improvement on greedy-decoding, there are reasons to believe that beams based on the probability of the hypotheses are not particularly good at dealing with ambiguity. Indeed, if a parsing hypothesis has two plausible continuations, the probabilities of each of these continuations will suffer from the existence of the other, which means that both might be ejected from the beam (while less plausible parsing hypotheses might stay in the beam).
Thanks to the advance of reinforcement learning in general and Q-learning in particular (see, e.g., Mnih et al. 2013), it has become easier to train a parser so as to optimise a metric such as its expected F1/attachement score. One particularity of reinforcement learning schemes is that the system is not trained on gold trajectories (the trajectories from an initial state to a complete gold parse), but on its own predicted trajectories. A parser trained in such a fashion is thus expected to be more reliable at prediction time. A particularity of Q-learning more specifically is that it is not based on estimating probabilities for actions but values for states (in this case, parsing states) in such a way that maximising the parser’s objective is compatible with two continuations of the same parsing hypothesis both having high value. This is somewhat reminiscent of the structured perception used by Weiss et al. (2015). It thus seems that beams based on state value rather than probability might be better at dealing with ambiguity.
The goal of this research internship is to adapt a traditional transition parser in order to train it with a mix of Q-learning and standard log-likelihood maximisation. The impact of Q-learning training on greedy and beam-seach decoding will be studied.

Depending on the profile of the students, other questions might be approached instead; such as how model calibration impacts the performance of beam-search decoding, and whether A* decoding can be implemented as a viable alternative to beam-search decoding.

Relevant references:
— Chen, Danqi, and Christopher Manning. ‘A Fast and Accurate Dependency Parser Using Neural Networks’. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 740–50. Doha, Qatar: Association for Computational Linguistics, 2014. http://www.aclweb.org/anthology/D14-1082.
— Dyer, Chris, Miguel Ballesteros, Wang Ling, Austin Matthews, and Noah A. Smith. ‘Transition-Based Dependency Parsing with Stack Long Short-Term Memory’. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 334–43, 2015. https://doi.org/10.3115/v1/P15-1033.
— Mnih, Volodymyr, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. ‘Playing Atari with Deep Reinforcement Learning’, 2013. http://arxiv.org/abs/1312.5602.
— Weiss, David, Chris Alberti, Michael Collins, and Slav Petrov. ‘Structured Training for Neural Network Transition-Based Parsing’. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 323–33. Beijing, China: Association for Computational Linguistics, 2015. https://doi.org/10.3115/v1/P15-1032.

Le site espace carrière est une aide à l’insertion professionnelle. Il accompagne les étudiants d’Université Paris Cité dans leur recherche de stages, jobs ou encore de contrat d’alternance.

Retrouvez toutes les informations sur son fonctionnement.

Offres de stage

Stages 2025-2026 / Internships 2025-2026

LLF/INCC, UPC (G. Turco), Master 2: perception chez les nourissons

Laboratoire de linguistique formelle, UPC (A. Abeillé), Master 1-2: locality constraints

Laboratoire de linguistique formelle, UPC (A. Abeillé), Master 1-2: agreement

Laboratoire de linguistique formelle, UPC (A. Abeillé), Licence/Master: Spanish nouns

Laboratoire de linguistique formelle, UPC (C. Donati), Master: sign language acquisition

Laboratoire de linguistique formelle, UPC (C. Donati), Master: complexity in adolescents’ production

Laboratoire de linguistique formelle, UPC (C. Donati), Licence 3/Master: resumption in French relative clauses

LISN (Paris Saclay), Philippe Boula de Mareüil, Master: Classifications de dialectes romans et de créoles à partir de différents types de distances

Inria (C. Clavel), Licence/Master: Multimodal annotation

Laboratoire de linguistique formelle, UPC (A. Abeillé), Licence/Master: Spoken French

Offres de stages passées / Past offers of internships

Laboratoire de Linguistique Formelle, UPC (Lisa Brunetti), Licence et Master: expériences de production

Laboratoire de Linguistique Formelle, UPC (Anne Abeillé), Licence et Master

Stages au LLF (UPC) Anne Abeillé

UFR d'Etudes Anglophones, UPC (Emmanuel Ferragne) - Master

Laboratoire de Linguistique Formelle, UPC (Anne Abeillé) - Master

Laboratoire de Linguistique Formelle, UPC (Jana Rameh), Licence et Master

Laboratoire de Linguistique Formelle, UPC (Anne Abeillé), Master

Laboratoire de Linguistique Formelle, UPC (Patrick Caudal), L3, M1, M2

Laboratoire de Linguistique Formelle, UPC (Ira Noveck), L3, M

Laboratoires LLF et HTL (UPC) et LACITO (Guillaume Wisniewski, Aimée Lahaussois et Séverine Guillaume), L3, M1

Stages au Laboratoire de Linguistique Formelle (UPC)

Stage au LLF (UPC)

À lire aussi

Le LingFest

Rentrée 2025

Portes ouvertes sur les Masters en ligne samedi 8 mars 2025

Film sur l’UFRL