pronouns

Pronouns in French #

Universal #

French pronouns encode grammatical features such as person, number, case, and sometimes gender. However, several forms are ambiguous or exhibit inconsistent annotation in current treebanks. This guide clarifies how to annotate pronouns more consistently, particularly regarding French.

Lemma normalization #

Pronoun lemmas should be unified across paradigms to reflect core referential identity rather than surface distinctions.

Subject pronouns

In many current annotations, the singular pronouns “il” and “elle” are assigned the lemma lui, while the plural pronouns “ils” and “elles” have a different lemma, eux. This distinction is unnecessary and inconsistent because these pronouns share core referential properties and syntactic distribution. Therefore, we propose unifying both singular and plural third-person pronouns under the single lemma lui, rather than maintaining separate lemmas like eux.

Reflexive pronouns

The pronoun “me” is currently annotated with two different lemmas depending on reflexivity:

  • moi when interchangeable with “lui”
  • soi when inherently reflexive (e.g., “je me souviens”)

Here we use a single third-person reflexive pronoun lemma (e.g., soi) in all inherently reflexive constructions, and disambiguate others using contextual syntax.

Many cases are semantically ambiguous, and some commutations change verb meaning. Treating French as having only a third-person reflexive pronoun aligns with typological precedent.

Gender suppression #

The pronoun “lui” is often annotated with Gender=Masc, but this is misleading in indirect object contexts, where lui can refer to either a masculine or feminine antecedent. In these cases, gender is not recoverable from the form and should therefore not be encoded.

This suppression improves consistency, particularly when compared to “leur”, which is never annotated with a gender feature despite being functionally parallel.

In contrast, “lui” retains Gender=Masc when used emphatically and contrasted with “elle”. In such cases, gender is semantically recoverable and should be retained.

Feature Case #

In practice, the Case feature for weak pronouns is reliably recoverable based on the syntactic relation (subj, comp:obj, comp:obl). Therefore, forms like “me”, “te”, “nous”, and “vous” can be disambiguated without ambiguity in most cases.

Nominative

Subject pronouns (e.g., “je”, “tu”, “il”, “nous”, “vous”, on”) should receive Case=Nom when they are in subject position (subj).

Accusative

Direct object pronouns (e.g., “me”, “te”, “le”, “la”, “les”) should receive Case=Acc when used as comp:obj. This excludes emphatic forms (e.g., “moi”) and already annotated cases.

Dative

Indirect object pronouns (e.g., “me”, “te”, “lui”, “leur”) should receive Case=Dat when used in oblique object function comp:obl.

Feature Emph #

So-called formes fortes such as “moi”, “toi”, “lui-même”, “elles-mêmes” are marked with Emph=Yes. They typically do not receive a Case feature. And pronouns with a Case feature aren’t emphatic forms and receive Emph=No.