A couple of months ago, I was listening to RADIOLAB on my Sunday run. If you've not heard the program, I recommend sampling a few of the episodes. The hosts are a bit on the quirky / nerdy side (probably why I enjoy listening). They also cover topics that you won't typically come across in your normal media diet - including a segment on how a reporter for New Scientist magazine ended up applying voltage to her skull (cognitive enhancement) and then getting tested as a sniper at a US Military base. She did very well.
Another program focused on numbers. Favorite numbers. A fellow by the name of Alex Bellos - who blogs for the guardian and writes math books that also bend to quirky - was talking about them. Alex had been looking into the "Why" of favorite numbers - asking people what their favorite number was, and then why. The range (and distribution) of numbers and reasons was interesting. Some people focused on the shape of the digit (like 8) while others on the meaning of the number (the trinity of 3) while others on the inherent attributes (like prime 7 or balanced 4)
Alex Bellos' Experiment
Alex decided to have a little fun and put up a little web site to ask people these same questions. Loads of people felt the desire to tell their story, so the response was impressive. Alex quickly had a very tall stack of data from more than 30 thousand people. Simply a number "8" and a WHY "Because it's so round, pretty and feminine looking" for example. Seven (7) was the clear winner, with three (3) and eight (8) taking silver and bronze.
I dabble with data and data viz so it got me curious what this data set looked and felt like, and what kind of stories might be inside of it.
So I did a little research on Alex - including here http://alexbellos.com/ - When I saw he wrote books by the title of "Here's Tooking at Euclid" and "The Grapes of Math" - I knew he was my kind of guy, and might be open to an email. Long story short - after a few e-conversations, Alex generously provided me with a piece of his data set to play with. (Thanks Alex!); He has also posted more info here http://pages.bloomsbury.com/favouritenumber with the results of his global survey.
Two Areas of Interest - Word Frequency and Synesthesia
There were two areas I took a look at - the first was the distribution of why words: After scrubbing out many of the unhelpful words (me, I, like, because) - what frequency do the "Meaningful" words have for each of the numbers? Furthermore, if we applied them to a wordcloud - in the shape of the number - what might that look like? Results are below and through this blog. It's a little hacky - I'm still using MS Paint technology from 1989. Old School.
The second area, which I'll write about in another post - is COLOR and synesthesia. Breifly, I was curious if the colors that were mentioned in relation to the numbers, had any relationship to known mappings of people who posessed a synesthesia (with a number-color connection)
1) TRANSFORM - As always, the data munging and scubbing took most of the time. I used a combination of Excel (becuase I'm pretty handy with it) and R (because I'm learning it and forcing myself to use it) to remove the uninteresting words, punctuation, and handle the weird data slices, where people would provide three favorite numbers at once (messy).
2) LOAD - Once I had the data, I filtered it and I imported it into R. (I was a bit naughty and filtered it before import, with more time, I'd create a filter subset within R to pick ny number)
3) PROCESS - R has a nice library for wordclouds - AAAAA - and I ran this simple code. Took some tuning and I made sure I generated several versions of them - because some verisons did not lend themselves well to the carving and shaping stage. For example, if the dominant and largest word fell on on the midsection of an eight, it would be lost.
4) SHAPE - I also took a rather rough, but effective approach to shape the word could. I found some very large font text and then screen shotted it. I then inverted the colors - so that the white would remain and the black would obscure anything outside margins. then, I donned a beret and released my artist as I tried to find a good fit for 90% of of the information. This involved some trial and error. Lastly, inevitably some words got 'clipped' so I removed these and filled in any of the worst gaps. and VOILA ! A rough wordcloud.
Will post this shortly.
Below are the wordclouds - they are first draft / rough versions. I've thought about applying different colors to the different numbers (where there was a strong color bias)
anyway, there it is - not too fancy, but I had fun doing it.
About this blog
Description is...<br/>Data Analytics & Visualization Blog - Generating insights from Data since 2013
Created: July 25, 2014Englishfrançais