Continuing my saga of visualizing Koornk social network I decided that obvious next step is to map out who talks to who and how much. For this task I used excellent Python library NetworkX that uses pygraphviz to draw the pretty pictures in the end.
Just to explain what you’re looking at:
I downloaded all public conversations from Koornk and filtered out to the ones that use @ somewhere to reference someone else
You need to all-together reference or be referenced 60 times to get on the list (70 people from 1606 made it)
From those 70 people, if two of them talked more then 40-times they got a line between each other
Line thickness is then calculated based on how much they talked to each other
Circle size around each person tells you their cumulative chatter towards others
Fun statistic: about 22% of all message looked at (N=81990), contained @ reference
It turns out that there’s a smaller group of very vocal people within this view, so we naturally want to see zoomed version:
It takes about two days to properly get a hang of NetworkX library to draw something like that. It doesn’t mean you know anything about graph theory, but at least you can start drawing pretty pictures.
Pictures are fun, but next step is probably interactive Flash diagram that allows you to explore these relationship for yourself
Throwing around these data structures actually takes a few seconds on modern PC. Finally something meaningful for it to process.
I wonder how much work would be to properly plot something like this for a subset of Twitter relationship if I maybe drink from their fire-hose long enough. Maybe Gnip guys can fill up a few Terabytes of Hard Drives with back log, if they have it and we start crunching this. (I’m assuming that there’s already a post-graduate student somewhere that’s doing exactly this)
Creating a good visualization consist of two major parts:
having a robust visualization technique (wave graph in this example)
having a good data set that fits to the visualization technique
After I got Graphication working yesterday, I quickly realized that my initial data set doesn’t fit this technique out of the box as it was one stream and not a series of intertwining ones. Looking around, I’ve discovered a perfect one – chatter on Koornk.
How does it look?
What does it mean?
It’s a Wave Graph visualization of who the person in question is talking to. In good old Twitter fashion, Koornk also uses @ to reference people so you can say: “@jure: foo!” or “I’m drinking coffee with @Miha and @bufo”. In all the cases my script counts nicknames after @ and aggregates them on a weekly basis. For a nickname to be eligible to get on the list you have to mention it at least twice in a week.
Any interesting observations?
Looking at these graphs you can start seeing how easy data mining and importance of protecting you online privacy.
Example of that would be @bufo who’s graph looks like this:
You can easily see that he talks a lot to: @Miha, @Katja, @jure, @Hirkani and a few others. That instantly gives us some information about his online friends and we can assume that since it’s Slovenia that he probably also knows them in person or that they have at least some things in common.
While that doesn’t seem too revealing (at least to their friends) we have to be aware that this information is now available to anyone willing to crawl the web and connect the dots. There should be at least some targeted advertising in this 🙂
creating these visualizations is harder then it looks. Mostly because you have to know your data set well to process it correctly.
it’s CPU intensive. Drawing each of these things takes a good few seconds every time. It’s not a big problem if you’re doing this off-line but there might be an issue of scaling here.
having a good API to get data from is important. Luckily Koornk API is good and fast.
OS X is a pain to use pycairo in as it keeps crashing my python. Useful workaround is to have Linux running in a local VMWare and run computing batches there.
visualization hopefully isn’t a purpose for itself. It’s much more rewarding to teach a community something about itself.
Saturdays are the days for hacking (after having a great run earlier of course). This time I’ve been preparing a network data set for a friend. The idea was to take a look at who follows who on Koornk.
This is what I’ve got after a productive afternoon:
I’ve only followed nodes that can be found by starting with Matija as initial node. Quick look at the resulted graph, we can see that their is a well connected group of people in a middle, with another group that is only partially connected to the first group. There are of course many others who are not well connected.
As this is only first prototype, there are a few more things to do on some other nice Saturday:
figure out a way to get list of all nicknames (one way would be to just go through the list of messages)
weight different connections, based on who’s talking to who
interpret the data (but since that’s not my homework, I hope the recipient of data set will want to do a guest-blog about this)
Slovenian Web 2.0 geek scene got a new toy a few months ago, it’s called Koornk (better Twitter). Feeling a bit blue for just writing emails lately and not programming anything nice, I decided a few days ago to roll out an one night hack – Koornk personas that tell you if you need your umbrella on that day.
Usage is simple, just follow @deznikjesenice, @deznikcelje, @deznikmaribor or @deznikljubljana to get a a simpe notification in the morning. It’s based on what Slovenian weather service thinks the probabilities for selected regions are. Their track record is usually pretty good, so I’m confident in occurrence of the predictions.
I already received a few useful tips of where to get more data about weather predictions, so we might soon also start suggesting you what kind of coat to wear that day, depending on the temperature forecast.
While building something nice on Koornk (Slovenian Twitter like service), I stopped for a few months to learn how to do Authenticated POST requests using Python. I found urllib2 way too complicated, but soon I stumbled across great Yahoo Developer page – Make Yahoo! Web Service REST calls with Python that also lists alternative approach using httplib2 which then works beautifully.