Topics as Word Clouds

Elijah Meeks and Mat Jockers both have used word clouds to visualize topics from topic models. Colour, orientation, relative placement of the words – all of these could be used to convey different dimensions of the data. Below, you’ll find clouds for each of my initial 50 topics generated from the Roman materials in the Portable Antiquities Scheme database (some 100 000 rows, or nearly 1/5 the database, collected together into ‘documents’ where each unitary district authority is the ‘document’ and the text are the descriptions of things found there). The word clouds are generated from the word weights file that MALLET can output. There are 8100 unique tokens when I convert the database into a MALLET file; each one of those is present in each ‘bag of words’ or topic that MALLET generates, but to differing degrees. Thus, word clouds (here generated with Wordle) pull out important information that the word keys document does not. However, given that I optimized the interval whilst generating the topic models, the keys document provides an indication of the strength of the topic in the corpus. I’ve arranged the word clouds scaling them against the size of the strongest topic (topic 22), top-bottom, left-right. I’ll be damned if I can get wordpress to just display each image under the other one. Even stripped my table out, it did!

At any rate, as one churns through the 50 topics, after about the first 11 (depicted below), the topics get progressively more noisy as MALLET attempts to deal with incomplete transcriptions of the epigraphy of the coins, and the frequent notes about the source for the identification of the coins (the work of Guest & Wells). The final topic depicted here, topic 20, directly references a note often left in the database concerning the quality of an individual record; these frequently are in connection with materials that entered the British Museum collection before the Portable Antiquities Scheme got going and hence the information is not up to usual standards.

This exercise then suggests to me that 50 topics is just too much. I’m rerunning¬†everything¬†with 10 topics this time.

Topic 22

Topic 22

Topic 48

Topic 48

Topic 43

Topic 43

Topic 32

Topic 32

Topic 7

Topic 7

Topic 33

Topic 33

Topic 13

Topic 13

Topic 47

Topic 47

Topic 46

Topic 46

Topic 35

Topic 35

Topic 20

Topic 20

About these ads

One thought on “Topics as Word Clouds

  1. Pingback: What’s in a word? | q o k e d y

Comments are closed.