In spirit of eating my own dog food, I’m publishing the presentation that I’m giving tomoorow at my old faculty to undergrad class about things I live and breath – twitter, social media and branding. Probably not much for veterans reading this blog but I was surprised how many good case studies I could do from your blogs. Kudos to @anjarenko, @anejmehadzic and @jernej 🙂
Continuing my saga of visualizing Koornk social network I decided that obvious next step is to map out who talks to who and how much. For this task I used excellent Python library NetworkX that uses pygraphviz to draw the pretty pictures in the end.
Just to explain what you’re looking at:
- I downloaded all public conversations from Koornk and filtered out to the ones that use @ somewhere to reference someone else
- You need to all-together reference or be referenced 60 times to get on the list (70 people from 1606 made it)
- From those 70 people, if two of them talked more then 40-times they got a line between each other
- Line thickness is then calculated based on how much they talked to each other
- Circle size around each person tells you their cumulative chatter towards others
Fun statistic: about 22% of all message looked at (N=81990), contained @ reference
It turns out that there’s a smaller group of very vocal people within this view, so we naturally want to see zoomed version:
- It takes about two days to properly get a hang of NetworkX library to draw something like that. It doesn’t mean you know anything about graph theory, but at least you can start drawing pretty pictures.
- Pictures are fun, but next step is probably interactive Flash diagram that allows you to explore these relationship for yourself
- Throwing around these data structures actually takes a few seconds on modern PC. Finally something meaningful for it to process.
- I wonder how much work would be to properly plot something like this for a subset of Twitter relationship if I maybe drink from their fire-hose long enough. Maybe Gnip guys can fill up a few Terabytes of Hard Drives with back log, if they have it and we start crunching this. (I’m assuming that there’s already a post-graduate student somewhere that’s doing exactly this)
Creating a good visualization consist of two major parts:
- having a robust visualization technique (wave graph in this example)
- having a good data set that fits to the visualization technique
After I got Graphication working yesterday, I quickly realized that my initial data set doesn’t fit this technique out of the box as it was one stream and not a series of intertwining ones. Looking around, I’ve discovered a perfect one – chatter on Koornk.
How does it look?
What does it mean?
It’s a Wave Graph visualization of who the person in question is talking to. In good old Twitter fashion, Koornk also uses @ to reference people so you can say: “@jure: foo!” or “I’m drinking coffee with @Miha and @bufo”. In all the cases my script counts nicknames after @ and aggregates them on a weekly basis. For a nickname to be eligible to get on the list you have to mention it at least twice in a week.
Any interesting observations?
Looking at these graphs you can start seeing how easy data mining and importance of protecting you online privacy.
Example of that would be @bufo who’s graph looks like this:
You can easily see that he talks a lot to: @Miha, @Katja, @jure, @Hirkani and a few others. That instantly gives us some information about his online friends and we can assume that since it’s Slovenia that he probably also knows them in person or that they have at least some things in common.
While that doesn’t seem too revealing (at least to their friends) we have to be aware that this information is now available to anyone willing to crawl the web and connect the dots. There should be at least some targeted advertising in this 🙂
- creating these visualizations is harder then it looks. Mostly because you have to know your data set well to process it correctly.
- it’s CPU intensive. Drawing each of these things takes a good few seconds every time. It’s not a big problem if you’re doing this off-line but there might be an issue of scaling here.
- having a good API to get data from is important. Luckily Koornk API is good and fast.
- OS X is a pain to use pycairo in as it keeps crashing my python. Useful workaround is to have Linux running in a local VMWare and run computing batches there.
- visualization hopefully isn’t a purpose for itself. It’s much more rewarding to teach a community something about itself.
I’ve also generated a gallery of 65 most chatty people on Koornk if you want to look at more of these pictures (or find yourself).
I was looking at my Facebook feed today as a pattern suddenly occurred to me. From all the things people post to their account or change, almost the only one that consistently gets at least one comment from someone is when they change relationship status to a lower level. Going from married to single or declaring that it’s complicated.
A few months ago I speculated that it’s unfortunate that Facebook doesn’t allow you publicly lose a friend, but luckily enough it allows you to publicly dump someone.
As evidence I present this nice rendering of my Facebook time line of today.
I’m going to be short today and spare a lenghty rant about the importance of relationships in human society.