Lexicon 語 · 13

Dictionaries and corpora

From Bluteau and Morais Silva to the Common Orthographic Vocabulary and the great electronic corpora: how the Portuguese lexicon is recorded, standardised and observed.

en

Every written language is eventually inventoried. The dictionary describes what words mean; the orthographic vocabulary fixes how they are spelled; the corpus shows how they are actually used. These three instruments — the work of reference, the norm, and the empirical record — form the infrastructure on which knowledge of the Portuguese lexicon rests.

From bilingual lists to the first monolingual dictionary

Portuguese lexicography was born, as almost everywhere in Europe, in the service of Latin. The earliest works were bilingual, meant for those learning the language of high culture: the Dictionarium ex Lusitanico in Latinum sermonem of Jerónimo Cardoso (1562–1569) is usually cited as the first to record Portuguese vocabulary systematically, even if only as a starting point for Latin.

The decisive leap came in the eighteenth century with Rafael Bluteau’s Vocabulario Portuguez, e Latino (eight volumes, 1712–1728), a monumental lexical encyclopaedia. On it worked António de Morais Silva, whose Diccionario da Lingua Portugueza (1789) is regarded as the first modern monolingual dictionary of Portuguese: it defines words in Portuguese, is arranged alphabetically, and documents senses with authors. Its successive editions ran through the nineteenth century and made it the reference par excellence.

The contemporary monolingual dictionary

Several major works of reference coexist today. In Portugal, the Lisbon Academy of Sciences published its Dicionário da Língua Portuguesa Contemporânea in 2001, while the publishing tradition keeps widely circulated titles such as Porto Editora’s Grande Dicionário da Língua Portuguesa. Digital media have become central: dictionaries like Priberam and Infopédia are, for most speakers, the ordinary way to look a word up.

A dictionary is not a neutral list. Each one makes choices about which registers to include (slang, regionalisms, loanwords), how to order senses, and how much etymological and grammatical information to supply.

facto, n. m. ‘event, fact’; from Lat. FACTUM ‘that which has been done’.

A typical entry compresses part of speech, definition and etymon — three layers of information in a single line.

The VOC and the orthographic vocabularies

Distinct from the dictionary is the orthographic vocabulary: it does not define words but lists them in their correct spelling, with gender, plural and syllable division. It is the instrument that gives the norm material form. The 1990 Orthographic Agreement, in its Article 2, expressly called for a common orthographic vocabulary of the language, “as complete as desirable and as standardising as possible”.

That mandate produced the Common Orthographic Vocabulary of the Portuguese Language (VOC), developed by the International Institute of the Portuguese Language (IILP), an organ of the CPLP. The VOC does not replace the national vocabularies: it integrates them. Each member state draws up its own national orthographic vocabulary, and the VOC gathers them on a single platform, making visible both the shared words and those proper to each country.

Corpora: the language observed

A corpus is a large collection of real texts — newspapers, literature, speech transcriptions — in electronic, annotated and searchable form. It lets the researcher step beyond intuition and put questions to the language: which words co-occur with saudade? Which preposition does a given verb govern? How often is mesoclisis actually used? Modern lexicography is increasingly corpus-based.

CorpusHeld byNature
CRPC — Reference Corpus of Contemporary PortugueseCLUL (Univ. of Lisbon)General reference for EP
CETEMPúblicoLinguatecaTexts from the Público newspaper
Corpus do PortuguêsM. Davies & M. FerreiraHistorical and multi-variety
Tycho Brahe CorpusUniv. of CampinasAnnotated historical Portuguese

With these resources, frequencies, concordances and changes over time cease to be impressions and become data. A dictionary deciding whether a word is “established enough” to be admitted now consults the evidence of a corpus.

…uma profunda saudade de casa… · …a saudade dos que partiram… · …matar saudades…

Concordance lines like these reveal a word's typical contexts — here, the patterns in which saudade recurs.

Why they matter

Dictionaries, vocabularies and corpora answer different but complementary questions: what it means, how it is spelled, and how it is actually said. Together they turn the lexicon from speakers’ diffuse knowledge into an object that is described, standardised and verifiable — and they give Portuguese, spread across several continents, shared instruments with which to know itself.

Sources

  1. Rafael Bluteau. Vocabulario Portuguez, e Latino . Colégio das Artes, Coimbra (1712–1728)
  2. António de Morais Silva. Diccionario da Lingua Portugueza (1789)
  3. Academia das Ciências de Lisboa. Dicionário da Língua Portuguesa Contemporânea . Verbo (2001)
  4. Telmo Verdelho. As origens da gramaticografia e da lexicografia latino-portuguesas . Instituto Nacional de Investigação Científica (1995)