Lexical Richness

This first digital analysis focuses on a simple indicator of vocabulary variation, the type-token ratio (TTR). The TTR measures the lexical richness of a text; that is, how varied are the words in any given passage.

The TTR index gives the ratio between the number of ‘types’ and ‘tokens’ in a text. The number of ‘types’ is the number of unique word forms; the number of ‘tokens’ is the total number of words. To give an example: in the sentence This cat is black and this cat is white there are 6 types (this / cat / is / black / and / white) for 9 tokens.

To explore the TTR of Jane Eyre‘s language we took chapters as the units of analysis. We then used the Treetagger tokenizer to calculate numbers of tokens and types. From the TTR values of each chapter, we constructed this graph where the x-axis shows the different chapters of Jane Eyre and the y-axis shows the lexical richness.

As you can see, the vocabulary used in Jane Eyre is not uniform across the different chapters. Interestingly, chapters 1 and 36 are those with the highest TTR; and in fact lexical richness is concentrated in the first two chapters and the last three, i.e., the beginning and end of the novel.

This next visualisation shows the same information in a different way: the bigger the rectangle, the greater the lexical variety. You will see that the largest rectangles, directly proportional to a higher TTR, are those of chapters 1, 2, 36 and 38.

If you look back at the first graph, it is interesting to note that lexical variety tends to decrease towards the middle of the novel. The exceptions are chapters 22 and 23, which represent the only lexical peak in the center of the work. These are the crucial chapters in which Jane returns to Thornfield and sees Rochester again after the death of Mrs Reed, and in which he then makes his marriage proposal.

Text and research by Giovanni Pietro Vitali