PUNCT

Punctuation #

Universal #

Definition from de UD website

Punctuation marks are non-alphabetical characters and character groups used in many languages to delimit linguistic units in printed text.

Punctuation is not taken to include logograms such as $, %, and §, which are instead tagged as SYM. (Hint: if it corresponds to a word that you pronounce, such as dollar or percent, it is SYM and not PUNCT.)

Spoken corpora contain symbols representing pauses, laughter and other sounds; we treat them as punctuation, too. In these cases it is even not required that all characters of the token are non-alphabetical. One can represent a pause using a special character such as #, or using some more descriptive coding such as [:pause].

Examples

  • Period: .
  • Comma: ,
  • Parentheses: ()

French #

TODO

Overview #

Specific Pattern #

Haitian Creole #

TODO

Overview #

Specific Pattern #