Introducing the Linguistic Inquiry and Word Count

by Dr. Ryan Nichols, Philosophy, Cal State Fullerton, Orange County CA

As I write this column there are, remarkably, no Youtube guides for the use of the Linguistic Inquiry and Word Count. This is a shame since the Linguistic Inquiry and Word Count, ‘LIWC’ (pronounced ‘luke’) for short, is one of the best textual analysis software tools out there.

LIWC2007 logo represented with some word categories. Source: Author image

LIWC allows users to look under the hood of works of literature. When uploading a text to LIWC, the user will receive an output containing more than 70 columns of data. For example, if I upload this blog post to LIWC, it might return the result that 17.32% of the text falls under LIWC’s cognition category while only 1.2% falls under the religion category, and so on. This is useful information for several reasons illustrated in this and the following post.

LIWC’s design has made it a favorite for psychologists, but it also finds use in marketing, twitter analysis, mental health diagnostics and much more. Psychologists across the world have developed LIWC dictionaries in their native languages. As of writing, languages supported include Arabic, Chinese, Dutch, English, French, German, Italian, Portuguese, Russian, Serbian, Spanish, and Turkish. LIWC is an extremely affordable software tool. LIWClite7 is $30 USD while LIWC2007, the full version, is $90 USD. (When compared to shareware text analysis software, this is not cheap. But proceeds from LIWC funnel to the University of Texas Department of Psychology to support its work.)

Another key reason for praising LIWC is the quality of LIWC’s dictionary design. The LIWC2007 dictionary contains 4500 words and word stems. Each is filed into one or more subdictionaries. Subdictionaries represent one of the 55 word categories through which LIWC compiles a text. For example, the word “cried” is part of “five word categories: sadness, negative emotion, overall affect, verb, and past tense verb. Hence, if it is found in the target text, each of these five subdictionary scale scores will be incremented” (Pennebaker et al., 2007, p. 4). What makes this so special is that Professor Jamie Pennebaker and developers psychometrically validated the subdictionaries with great effort. This means that values across LIWC categories have been shown to correlate with big-five personality traits (Pennebaker & King, 1999; Mehl, Gosling, & Pennebaker, 2006).

The psychometric validation of LIWC categories is significant because it allows LIWC users to draw justified inferences from word frequencies to psychological states of the authors. For this reason the potential for LIWC’s use in the context of the humanities, religion in particular, is largely untapped. CERC is using it for a few projects. In a pilot research project designed to test the application of LIWC to research questions in the humanities, Justin Lynn, Ben Purzycki and I compiled a large corpus of literary texts from three genres, Science Fiction, Fantasy, and Mystery, in order to test the interpretations of humanities scholars about genre. In a research project about contemporary Protestantism Oliver Gunther, Carson Logan and I compiled about 400 sermons drawn from 12 denominations in order to test whether the language across the denominations, in particular, their use of supernatural agency terms, strongly correlated with known differences in theological orientation and known categories in the sociology of religion.

In two upcoming posts about LIWC we will describe each of these in more detail in order to give a sense for the questions a humanist can pursue with the Linguistic Inquiry and Word Count. In the meantime, however, due to the dearth of instructional videos about LIWC, I recorded a video introduction here.