VT 1215 notes

Issues in researching lexical richness

  • Majority of Traditional productive studies (Cons: count words as types and tokens)
  • (1990s) examples of families: –able, –er, –ish, –ly, –ness, –th, –y, non-, un-
  • Eg. closely related words: answer, answered, answers, answerable, unansweable….
  • L1 look for psychological relaity
  • L2 …. isuees of not able to provide 4 major words classes (V, N, Adj, Adv)
  • How to treat errors? Lack of diss/ in most studies 
    • ignore small spelling mistakes
    • include incorrect derivational of words

But for receptive studies, they count words based on the forms (not the meanings!)

Steps involved in measuring lexical richness

  • Step 1: decide on the text to ba analyzed
  • Step 2: decide on the unit of counting (psycho reality of words for the participants)
  • Step 3: decide what to do with errors
  • Step 4: decided how to measure lexical richness


A critique of a productive study: Crossley and Mcnamara (2009)

Coh-Metrix (http://cohmetrix.memphis.edu/)

曼菲斯大學所發展的一個線上文本分析系統(a tool for measuring lexical richness),透過各項指標的計算數值(a measure of lexical variation),可以了解:

  • 文本凝聚力(cohesion) – more than 200 indices of cohesion, difficulty, readility
  • 文本心理表徵的連貫性(coherence)

It uses these indices:

  1. 英文可讀性公式
  2. 文字描述性分析
  3. 詞彙多樣性
  4. 詞性分類
  5. 句法分析
  6. 潛在語意分析(Latent Semantic Analysis http://en.wikipedia.org/wiki/Latent_semantic_analysis)


  1. standardized indices
  2. fast
  3. no issue of subjectivity

Need Research

  • Receptive studies to look for lexical richness of lacking spoken discourse
  • Spoken / Written diff
  • Esp. learn through incidental reading and listening
  • ……

A Critique of a receptive study: Cobb (2007)

the Range problem (http://www.lextutor.ca/range/)

  • use the estimate that would provide the greatest potential to support Krashen’s claim
  • Laufer’s (1989) estimate: 3000 word familiy
  • minimum threshold of 6 threshold w/ unknown words…sufficient for incidental vocabulary learning? 
  • ……

Designing a study of lexical richness

errors – they are yet to be studied



  2. LSA http://en.wikipedia.org/wiki/Latent_semantic_analysis