Culture 風 · 10
Portuguese in the Digital World
One of the internet's major languages, Portuguese faces its own challenges in technology and science — accents and encoding, new vocabulary, language tools, and the dominance of English.
enSpoken by more than two hundred and fifty million people across four continents, Portuguese is today one of the most present languages on the internet and one of the most published, translated and machine-processed. That global scale, however, does not spare it from challenges of its own: representing its accents in computer systems, the rapid coining of technical vocabulary, the building of language tools, and the overwhelming weight of English in scientific communication.
One of the languages of the web
By number of users and by volume of content, Portuguese consistently ranks among the leading languages of the internet, driven above all by Brazil’s demographic weight and by the vitality of Lusophone communities in Portugal, in Africa and in the diaspora. Social media, e-commerce and audiovisual production have made the digital space one of the language’s chief living territories — and, with it, an engine of diffusion as powerful as the school or the press were in other eras.
The management of addresses reflects this plurality: the top-level domain .pt is run by the DNS.PT Association and .br by Registro.br, each with its own rules. Since 2005 it has been possible to register domains with accented characters (.pt accepts ç, á, ã and the like), a technical acknowledgement that the language does not fit inside the English alphabet.
Accents, keys and encoding
The first obstacle for digital Portuguese was physical and technical. Keyboards, encoding standards and early software were designed for English, and for decades the accented characters — á, é, ã, õ, ç — were a constant source of errors: the notorious “strange characters” that replaced letters whenever two systems failed to understand each other.
The spread of the Unicode standard and of UTF-8 encoding from the 2000s onward largely solved the problem, allowing the entire Portuguese alphabet — and nearly every writing system in the world — to be represented uniformly. The Portuguese keyboard follows the QWERTY layout with dead keys (you press the accent first, then the vowel), while the Brazilian ABNT2 keyboard includes a dedicated ç key.
A vocabulary under construction
Each technological wave brings with it a surge of anglicisms that the language absorbs, adapts or replaces. Some enter raw (software, email, site); others are reshaped to fit Portuguese morphology (bloguar, tuitar, formatar — to blog, to tweet, to format); others still acquire native equivalents that take hold (descarregar for download, navegador for browser).
Carreguei as fotografias para a nuvem e enviei-te a hiperligação por correio eletrónico.
I uploaded the photos to the cloud and sent you the link by email.
It is a field in which the European and Brazilian varieties diverge sharply, often because of software-localisation choices made independently on each side of the Atlantic:
| English | European Portuguese | Brazilian Portuguese |
|---|---|---|
| screen | ecrã | tela |
| mouse | rato | mouse |
| file | ficheiro | arquivo |
| user | utilizador | usuário |
| download | descarregar | baixar |
| password | palavra-passe | senha |
| mobile phone | telemóvel | celular |
The rato [ˈʁatu] that glides across a Portuguese desk is, on the other side of the ocean, simply a mouse.
The language and the machines
For a computer to handle Portuguese — to correct it, translate it, transcribe it from speech or search it — the language must first be described as data. This work of natural-language processing rests on large corpora and linguistic resources built by academic teams. For Portuguese, projects such as Linguateca and reference corpora such as CETEMPúblico (one hundred and eighty million words from the newspaper Público) underpinned spell-checkers, search engines and translation systems.
For many speakers, spell-checkers were the first encounter with language technology — and they also became a subtle instrument of the norm, deciding, on every keyboard, what “is correct”. Machine translation and speech recognition, now built on neural networks, have reached high performance for Portuguese, though the varieties least represented in data — the African ones above all — remain at a disadvantage relative to the European and Brazilian standards.
Portuguese in science
If on the web Portuguese is a major language, in international scientific communication it is a minor one. The overwhelming majority of research articles are now published in English, and Lusophone researchers, like those almost everywhere, write in English to be read and cited. Portuguese remains vigorous in popular science, in teaching and in the social sciences and humanities, but it is in retreat in the exact and life sciences.
Working against that trend are open-access initiatives such as SciELO (Scientific Electronic Library Online), founded in Brazil in 1997, which gives international visibility to journals in Portuguese. The 1990 Orthographic Agreement, by unifying spelling, was also conceived to ease the circulation of texts — scientific and otherwise — within a digital space shared by all the countries of the language.
A territory to be claimed
Portuguese’s place in the digital world is not guaranteed: it depends on keyboards and standards that accommodate it, on a vocabulary that keeps it able to name the new, on tools that process it, and on a political will to use it in science and technology. In that sense it is one of the grounds on which the language’s future vitality is now being decided.
Sources
- Language and the Internet . Cambridge University Press (2006)
- Avaliação conjunta: um novo paradigma no processamento computacional da língua portuguesa . IST Press (2007)
- A Língua Portuguesa na Era Digital . Springer (META-NET White Paper Series) (2012)