Data as a Kandinksy Painting

I just found this package for R, ‘Kandinsky‘. You can read the logic of what it does here.

I’m totally into representing data as art, so I thought I would feed all 900+ annotations my ‘Crafting Digital History’ class is making across the web through it

  • Grab all the annotations using Lincoln’s ‘Hypothesisr‘ package.
  • Turn that into tidy data:
word_counts <- documents %>%
  group_by(user) %>%
  unnest_tokens(word, text) %>% 
  count(user, word, sort = TRUE) %>%
  ungroup()
  • feed word_counts into kandinsky

et volia:

 

Now, let’s visualize the stopwords. I also add some custom stopwords to that list (things like ‘digital’, ‘historian’ etc, given nature of the course). Ecco:

There is something extraordinarily satisfying about those two images. The first captures the entire universe of possible responses that my students are making. In the second, that purple circle seems to my mind to correspond with the normal stopwords and the squiggles my additions. Let us now subtract the second from the first:

Interesting, this visualization of what remains after the stopwords are applied…

I can also do some other fun things with my annotations, such as term frequency – inverse distribution frequency to find out what words tend to characterize which students’ annotations. As a Kandinsky painting:

Let’s paint our feelings – here’s the sentiment of the annotations (‘affin’):

And here’s the same data again, but sorted from most positive to most negative:

Finally, let’s finish off with a topic model and then the top terms from the topic model:

Data is beautiful.

What does it mean? Well, that might take another post or two. Maybe the meaning would emerge if I also sonified, or 3d printed, this data. If we use the full sensorium…