Priorities : With limited recording capabilities, it is better to use frequency lists to record the most frequent words first. With unlimited recording abilities, the order doesn’t matter much since we we assume that all the target words will eventually be recorded. Frequency lists have high correlation between languages.
Corpus’purpose : As for language’s learning, written transcripts of spoken language such as films’ subtitles are known to be better materials (see SUBTLEX studies, 2007). Other corpuses will also allows you to do a good work to provide audio recording. For lexicographic purposes as Wiktionary, rare words are as interesting as frequent words, and the aim is to provide all items with their audio.
Consistency : It is best to provide consistent audio data, with same neutral or enhousiastic tone and same speaker.
Lexicon range for learners : For language learners and assuming learning via the most frequent words, a minimum vocabulary of 2000-2500 base-words is required to move the learner to autonomous level. Language teaching academics name this level the “threshold level”. The CEFR (Common European Framework of Reference for Languages: Learning, Teaching, Assessment), Chinese’s HSK levels and their pairing with CEFR levels, and some academic researches lead to the following relation between lexicon size, CEFR level and competence :
|600||A1||“Basic user. Breakthrough or beginner”. Survival communication, expressing basic needs.|
|1,200||A2||“Basic user. Waystage or elementary”|
|2,500||B1||“Independant user. Threshold or intermediate”.|
|5,000||B2||“Independant user. Vantage or upper intermediate”|
|20,000+||C2||“Mastery or proficiency”. Native after graduation from highschool.|
(*) : Assuming the most frequent word-families learnt first.
|C2||Has a good command of a very broad lexical repertoire including idiomatic expressions and
colloquialisms; shows awareness of connotative levels of meaning.
|C1||Has a good command of a broad lexical repertoire allowing gaps to be readily overcome with
circumlocutions; little obvious searching for expressions or avoidance strategies. Good command of idiomatic expressions and colloquialisms.
|B2||Has a good range of vocabulary for matters connected to his/her field and most general topics. Can
vary formulation to avoid frequent repetition, but lexical gaps can still cause hesitation and circumlocution.
|B1||Has a sufficient vocabulary to express him/herself with some circumlocutions on most topics pertinent to
his/her everyday life such as family, hobbies and interests, work, travel, and current events. Has sufficient vocabulary to conduct routine, everyday transactions involving familiar situations and topics.
|A2||Has a sufficient vocabulary for the expression of basic communicative needs.|
Has a sufficient vocabulary for coping with simple survival needs.
|A1||Has a basic vocabulary repertoire of isolated words and phrases related to particular concrete
|C2||Consistently correct and appropriate use of vocabulary.|
|C1||Occasional minor slips, but no significant vocabulary errors.|
|B2||Lexical accuracy is generally high, though some confusion and incorrect word choice does occur without
|B1||Shows good control of elementary vocabulary but major errors still occur when expressing more complex
thoughts or handling unfamiliar topics and situations.
|A2||Can control a narrow repertoire dealing with concrete everyday needs.|
|A1||No descriptor available|
|Users of the Framework may wish to consider and where appropriate state:
• which lexical elements (fixed expressions and single word forms) the learner will need/be
equipped/be required to recognise and/or use;
- Paul Nation and David Crabbe (1991), "A SURVIVAL LANGUAGE LEARNING SYLLABUS FOR FOREIGN TRAVEL" Victoria University of Wellington, New Zealand Published in System Vol 19, No 3, 1991, pp 191-201.
- "Common European Framework of Reference for Languages: Learning, Teaching, Assessment" (2001), (pdf
- Marc Brysbaert*, Michaël Stevens, Paweł Mandera and Emmanuel Keuleers (2016), How Many Words Do We Know? Practical Estimates of Vocabulary Size Dependent on Word Definition, the Degree of Language Input and the Participant’s Age. https://www.frontiersin.org/articles/10.3389/fpsyg.2016.01116/full