Langages n° 171 (3/2008)
Numéro épuisé
Recevez les numéros de l'année en cours et accédez à l'intégralité des articles en ligne.
This work investigates the quantitative and qualitative criteria that preside over the construction of electronic corpora in the context of the elaboration or the update of dictionaries. In particular the concepts of balanced and opportunistic corpora are addressed. It is shown that there are interesting linguistic phenomena that are not present in the largest balanced corpora currently available. Opportunistic corpora are many times bigger due to the availability of large quantities of electronic newspaper text. However, different studies conducted e.g. on the gender distribution or on archaisms show that the results vary considerably depending on the size and the sampling of the corpora. Hence, frequency is no longer a reliable criterion which poses a problem for opportunistic corpora with regards to their objectivity.