Tag Archives: visualization

Visualizing Slovenian coalition agreement

With the election of new Slovenian prime minister we also got formal release of a Coalition agreement. Since it’s a 72 page document, I was wondering what keywords would stand out. Here is the result:

Pogodba za Slovenijo 2012 - 2015 - word cloud (top 80 words)

While we’re at it, we can also take a look at the coalition agreement that Pozitivna Slovenija prepared. As we run them through the same process, we get:

Koalicijska pogodba - Pozitivna Slovenija - 2012 (top 80 words)

 

A few words on how to reproduce this:

  • Grab your favorite OCR software and convert scanned PDF into .docx
  • From Word save it into .txt file
  • Lemmatize the words so you normalize all the grammar rules
  • Apply stop-words (in this case mostly: ministrstvo*, vlada, slovenija*, ..)
  • Drop the resulting text into wordle.net

 

Where not to illegally park in Ljubljana

Where in Ljubljana it’s most likely that your car will be towed away? In short: city centre, beginning of Vi? and around Metelkova.

Click for interactive version

or alternative visualization

Click for interactive version

Source of this data is page from Javni Holding Ljubljana that publishes your car info and the street it was towed away from. Gašper created Scraper wiki for it and is collecting data for last 3 months (aggregated source, if you want to reuse it).

Heatmap as a visualization technique was chosen because the data itself is very fuzzy (only street names are given, without the street number). It also tells you which neighborhoods to avoid.

If you want to help us bring more of such mashups into the world, please consider adding other sources of data into si.ckan.net. These pictures are end results of one such example where data was not hidden behind a telephone or a piece of paper (in a locked filing cabinet in a disused lavatory behind a door that says “Beware of the tiger”).

Thanks goes to RTV Slovenia for hosting OpenData Hackday.

Visualizing Slovenian IT tax spending

My latest released project is focusing on Visualizing Slovenian IT tax spending (139 million euros), the idea here is to take otherwise meaningless numbers and display them visually in a way that tells a story of who is spending how much and on what. The data set comes directly from the government in semi-clean XLS file. Visualization technique I’ve decided on is treemap visualization to represent the data with different box sizes relative to each other.

Give it a try for yourself:

Launch the interactive Slovenian IT tax (in Slovenian)

Visualizing Slovenian IT tax spending

While visualization itself is nice, there are a two points that you have to be careful about when releasing such visualizations to the public:

Transparency of data and data transformations

In my case, the data set came directly from the government. In order to make sure that everyone can check my calculations I’ve included links to their file as well as provided a local copy in case their version changes or disappears.

You’re loosing and reinterpreting data with every visualization. That is why it’s important to also include transformation scripts so that others can check your work and possibly build on top of it or at least make sure that you didn’t do anything tricky with the data.

I’ve opted for a github repository where I’ve pushed all the associated files: http://github.com/gandalfar/itproracun. It’s a bit chaotic but it should be pretty self-explanatory to any python and JavaScript developer.

Telling the story
Every data visualization is trying to tell a story. It might not be obvious to the visualization author but it helps to identify this early in the process.

I started with just a simple breakdown based on the institutions:

It’s very noisy and it’s hard to compare different institutions to each other. Initial comments to this were that it’s not shiny enough. Cleaning the interface up I came to the following revision:

It’s much cleaner and what basically showed that I need to find an angle to this data. I decided to focus on the ratio between software and services vs. hardware and network equipment. Final version now tells a story of how police is spending a lot of their IT money on network and hardware equipment, while Tax Office is spending much more money on software and services.

Agenda of this last version of visualization should be clear to anyone who takes a few moments to study it.

Other lessons learned

Visualization toolkit should be powerful on one hand, but offer first results without too much work. JavaScript InfoVis Toolkit does this job very well. There are some interesting tidbits that are not entirely clear from the documentation, but become obvious once you start thinking how the rendering works.

The biggest time sink is parsing and cleaning up the data. Don’t expect that the .xls file will make any sense from the programmatic point of view, even though it mostly looks fine when viewed manually. Small parsing errors, moved cells and strange line breaks made parsing this data the biggest challenge.

Big thanks go the community of Slo-Tech and my brother that gave valuable feedback during the development.

I hope you’ve enjoyed this visualization. Let me know in comments what other points of view you’d like to see as well as your ideas how to further improve it.

10 Innovative online visualization tools

Here is a quick reference chart of online visualization tools. It’s here mostly for my reference as I plan to update it as I discover and test new ones that being created almost on daily basis.

In no particular order:

  1. http://verifiable.com/ – Turn any set of numbers into an explanatory picture with Verifiable.com
  2. http://manyeyes.alphaworks.ibm.com/manyeyes/ – Shared visualization and discovery
  3. http://www.swivel.com/ – Swivel’s mission is to make data useful so people share insights, make great decisions and improve lives.
  4. http://www.icharts.net/ – iCharts is a web services company that is creating a new market for online publishing and transactions around public and private charts.
  5. http://timetric.com/ – Making data useful
  6. http://www.trackngraph.com/ – The easy way to track and graph information
  7. http://widgenie.com/ – The all powerful data visualizer
  8. http://www.trendrr.com/ – Track, compare and share data, free. Identify trends across social graphs and networks, realize the potential of p2p, track engagement metrics, look at what is really happening, real time.
  9. http://www.wordle.net/ – Beautiful word clouds
  10. http://www.gapminder.org/upload-data/motion-chart/ – Gapminder/Google motion charts

Did I miss any? Leave a comment and I’ll update the post.

Reblog this post [with Zemanta]

Koornk network graph with pretty pictures

Continuing my saga of visualizing Koornk social network I decided that obvious next step is to map out who talks to who and how much. For this task I used excellent Python library NetworkX that uses pygraphviz to draw the pretty pictures in the end.

Just to explain what you’re looking at:

  • I downloaded all public conversations from Koornk and filtered out to the ones that use @ somewhere to reference someone else
  • You need to all-together reference or be referenced 60 times to get on the list (70 people from 1606 made it)
  • From those 70 people, if two of them talked more then 40-times they got a line between each other
  • Line thickness is then calculated based on how much they talked to each other
  • Circle size around each person tells you their cumulative chatter towards others

Fun statistic: about 22% of all message looked at (N=81990), contained @ reference

Pretty pictures

Top down view of all the 70 people who made the cut (click for bigger version)

Top down view of all the 70 people who made the cut (click for bigger version)

It turns out that there’s a smaller group of very vocal people within this view, so we naturally want to see zoomed version:

Who talks to who on Koornk and how much (click for bigger version)

Who talks to who on Koornk and how much (click for bigger version)

Lessons learned

  • It takes about two days to properly get a hang of NetworkX library to draw something like that. It doesn’t mean you know anything about graph theory, but at least you can start drawing pretty pictures.
  • Pictures are fun, but next step is probably interactive Flash diagram that allows you to explore these relationship for yourself
  • Throwing around these data structures actually takes a few seconds on modern PC. Finally something meaningful for it to process.
  • I wonder how much work would be to properly plot something like this for a subset of Twitter relationship if I maybe drink from their fire-hose long enough. Maybe Gnip guys can fill up a few Terabytes of Hard Drives with back log, if they have it and we start crunching this. (I’m  assuming that there’s already a post-graduate student somewhere that’s doing exactly this)
Reblog this post [with Zemanta]