Jure Cuhalev http://www.jurecuhalev.com/blog Code, data and visualizations Fri, 18 Jul 2014 11:20:50 +0000 en-US hourly 1 http://wordpress.org/?v=3.9.2 Observations from OKFestival 2014, Berlin http://www.jurecuhalev.com/blog/2014/07/18/observations-from-okfestival-2014-berlin/ http://www.jurecuhalev.com/blog/2014/07/18/observations-from-okfestival-2014-berlin/#comments Fri, 18 Jul 2014 11:20:30 +0000 http://www.jurecuhalev.com/blog/?p=1638 In quite a stark contrast to csv,conf (which was very developer oriented), I attended OKFestival the next two days. With over 1000 attendees from all areas of it really is a huge gathering of open proponents.

Open is the new default

While not necessarily true in all the fields yet, it looks like we’re at the point where a lot of government contracts around the world require the work to be licenses under one of the standardised open licenses (either compatible for code or Creative Commons for creative works).

Rebuild everything

It seems that we’re still very much in the early days of platform and network building. While there are a few standardised solutions in each field, it seems that there is still hard to collaborate on complex pieces of software. I think I saw quite a few different indexes, CKAN alternatives as well as proprietary solutions that are in process of being opened up. I think we could do better as community within each field to figure out how to collaborate.

Standing on shoulders of giants

A few years ago, we were complaining that governments don’t release data and that it’s hard to get attention from policy makers. Today a lot of these things are given, partially also because of education and activism efforts from different hack days, seminars and events. A lot of colleagues from such event went on to consult or to work for governments making it easier for other side to understand the issues and to find internal (technical) support for it.

]]>
http://www.jurecuhalev.com/blog/2014/07/18/observations-from-okfestival-2014-berlin/feed/ 0
Notes from csv,conf 2014 http://www.jurecuhalev.com/blog/2014/07/15/notes-from-csvconf-2014/ http://www.jurecuhalev.com/blog/2014/07/15/notes-from-csvconf-2014/#comments Tue, 15 Jul 2014 17:26:36 +0000 http://www.jurecuhalev.com/blog/?p=1632 Today I attended csv,conf in Berlin, which turned out to be an excellent conference full of people who gather and transform data on a daily basis.

CSV, comma separated values, file format seems like a joke at first – who seriously uses that today in age of SQL, no-SQL and other $random-DB solution? It turns out that almost everybody at some point – either as input or as data interchange format in cases where systems are not part of your organisation.

Fail quickly and cheaply

A few different people presented their solution for “testing” CSV files, which might be better describe as making sure they conform to a certain schema. They range from just simple checks to full-fledged DSL that allows you to specify rules and even do checksums against referenced files.

The reason I liked most for this is that it allows you to very quickly verify sanity of files that you received and allows you to quickly give feedback to the other party that is sending you these files. This ensure that some time later either in time or inside your pipeline you don’t have to deal with bad data.

Embrace the UNIX philosophy – do one thing at the time

Most of the speakers also mentioned that in order to keep your sanity, you should build your system as a collection of small dedicated tools that pipe into other dedicated tools. It doesn’t necessarily have to be Unix pipe, but more a collection of steps that convert data into another step/database and then as next step does the processing again.

Everybody has the same problems

I think the biggest takeaway for me was, that we’re all having the same issues. We all get messy datasets that are hard to parse and are full of strange errors and inconsistencies.

As with other things, there is no silver bullet. We’ll have to build and teach best practices around data – testing, cleaning and what works and what doesn’t. Just we’re doing it in terms of modern software development.

Interesting tools and libraries

]]>
http://www.jurecuhalev.com/blog/2014/07/15/notes-from-csvconf-2014/feed/ 0
Impressions from FITC Amsterdam 2014 http://www.jurecuhalev.com/blog/2014/02/27/impressions-from-fitc-amsterdam-2014/ http://www.jurecuhalev.com/blog/2014/02/27/impressions-from-fitc-amsterdam-2014/#comments Thu, 27 Feb 2014 22:02:56 +0000 http://www.jurecuhalev.com/blog/?p=1624 This week I had a pleasure of attending Future, Innovation, Technology and Creativity (FITC) 2014 Conference in Amsterdam. I wanted to update myself on what the creative industries are doing as I was mostly visiting developer and security oriented events.

In general, quality of talks and presenters greatly exceeded my expectations and I feel myself lucky that I managed to visit the event. While I’ll try to make a few more blog posts from specific presentations, here are a few of general observations.

More about “Generating Utopia” project.

Open Source DIY technologies are not for geeks anymore

It seems that there are certain waves of technologies that are first picked up by open source hackers and the world doesn’t fully understand at the time. Most of the projects included things that are casually talked at CCC events and hackerspaces: 3D printers and 3D scanners using Kinect, Arduino based DIY controllers, low tech prototypes with LED’s and smartphones or just OpenStreetMap with commons Mashup API’s and Processing.

I think what these platforms have in common is that they’re much more easily available to creative people and huge amount of information that’s available online makes it incredibly easy to use. With that experiments often grow into high quality commercial grade works. This gets noticed by commercial clients and suddenly your next project is produced by using the same material and techniques.

Amount of required knowledge and insight is insane ..

Keeping in mind things from the previous paragraph, it seems that today it’s not enough to just know how to do art composition you’re also required to know enough coding to do processing mockups, generative audio with help of OpenFrameworks and final touch in form of Final cut and interactive web applications.

You don’t have to be an expert in all of these things but you actually have to have basic knowledge in order to actually know how to ask for help for all the tools that you’re suddenly using.

.. and it’s expanding

Quite a few presenters were already showing experiments and initial thoughts that were done using Oculus Rift, massive crowdsourcing apps, new Xbox One Kinect and voice driven interfaces.

Things that would be unavailable to most artists 10 years ago are now accessible in form of easy to use kits for 100 USD or less and it’s even cheaper with development environments.

Opportunities are everywhere

For generation of developers and tinkerers, that grew up with trying to get Linux to work on random unsupported hardware, this presents so many great opportunities. At this point everyone is thinking about open hardware and software, proprietary solutions and services can’t compete with Github development model anymore and because of Kickstarter and global economies of scale – you just have to pay for the production costs.

I think 2014 is really a year of open everything and if you’re working in environment that spreads these kinds of ideas and tools, you don’t have to do much to get people to listen to you. You just have to show up and present and teach a workshop. It’s that easy :)

]]>
http://www.jurecuhalev.com/blog/2014/02/27/impressions-from-fitc-amsterdam-2014/feed/ 0
Results of WordPress Ninja Forms entries as JSON http://www.jurecuhalev.com/blog/2014/01/29/results-of-wordpress-ninja-forms-entries-as-json/ http://www.jurecuhalev.com/blog/2014/01/29/results-of-wordpress-ninja-forms-entries-as-json/#comments Wed, 29 Jan 2014 18:14:36 +0000 http://www.jurecuhalev.com/blog/?p=1620 Ninja Forms is rather nifty WordPress Plugin for forms. The main problem I have with it at the moment is that’s a rather mess in terms of data structure and getting data out of it. One can grab CSV file, but doesn’t really help you if you want to make a nice front-end.

So here’s a snippet that will dump your current form results in a way that you can further display them with AngularJS or similar.

]]>
http://www.jurecuhalev.com/blog/2014/01/29/results-of-wordpress-ninja-forms-entries-as-json/feed/ 0
Slovenian WordPress Developers http://www.jurecuhalev.com/blog/2014/01/17/slovenian-wordpress-developers/ http://www.jurecuhalev.com/blog/2014/01/17/slovenian-wordpress-developers/#comments Fri, 17 Jan 2014 10:10:51 +0000 http://www.jurecuhalev.com/blog/?p=1616 WordPress is exploding everywhere, including in Slovenia. I get requests for work almost every week, so there seems to be more demand than there are developers. As it’s usually with these things, developers often can’t market themselves so here’s a humble effort to make it easier to connect these two groups.

Google Spreadsheet with Slovenian WordPress Developers:

Feel free to add yourself.

]]>
http://www.jurecuhalev.com/blog/2014/01/17/slovenian-wordpress-developers/feed/ 0
How to organise and synchronise production WordPress with local development environment http://www.jurecuhalev.com/blog/2014/01/11/wordpress-sync-production-with-dev/ http://www.jurecuhalev.com/blog/2014/01/11/wordpress-sync-production-with-dev/#comments Sat, 11 Jan 2014 13:36:09 +0000 http://www.jurecuhalev.com/blog/?p=1609 In the last year I’ve either deployed or inherited about 10 new WordPress installations and managing them became a mess that quickly ate too much of my time. It seems that quite a few of my friends have the same problem – so here’s a quick overview on how to approach it.

Everything I describe here can definitely work on OS X or Linux and probably on Windows as they’re all either PHP or Python based tools.

Keeping up with updates

Clients don’t update their plugins or WordPress itself and when they do they won’t read changes clearly enough to be able to judge if upgrade would break something. I use InfiniteWP for this. It’s a standalone PHP installation that connects to your WP’s via InfiniteWP Client plugin. It’s free, with some commercial add-ons. You can set it up to email you when there are new updates and support remote backups of your sites, which will be useful for later stages.

From security standpoint, it’s definitely not optional, but at the moment – not updating seems a greater risk.

infinitewp

Local development environment

For each client’s site, I would have a local copy running on my computer. Depending on your preferences you might be using something like MAMP of XAMPP that packages MySQL, PHP and Apache server together. One thing to watch out is that you’re running your local development under the same major version of PHP as it’s often source of bugs (as my local PHP would support newer syntax than the one on server).

For each site, I would have a local alias – http://sitename.local/ to ensure that I don’t accidentally change things on production.

For things I would develop, usually a theme and an extra plugin, I would store them in git to keep revision history and feature branches.

I have yet to find a good way to version plugins, so for now the tactic is to try to keep up with latest versions of plugin and use them as little as possible and only from developers that have release blogs and sane release tactics.

Synchronising production to local environment (manually)

Sometimes I don’t have shell access to server – in that case I would use either InfiniteWP to generate a database dump (from InifniteWP dashboard) or UpdraftPlus from within WordPress dashboard.

Locally, I would then use wp-cli to reset local database:
wp db reset
and import new database:
wp db import sitename_db.sql

wp-cli supports local path substitutions, but it’s usually not needed. What I would do is modify my local wp-config.php to have:

define('WP_HOME','http://sitename.local/');
define('WP_SITEURL','http://sitename.local/');

This allows me to use copy of production database, without WordPress redirecting my logins to production URL.

For contents of wp-content/uploads I usually don’t bother as I can easily fix things without seeing images in last few blog posts.

Synchronising production to local environment (automated)

For the sites where I have shell access and can install wp-cli on server, I have ansible scripts (more on that later) that run:
wp db dump
locally and then copy it to my dev environment where they import it using wp db reset and wp db import combination.

This means that I can sync production to my local environment in less than a minute, making it a no brainer to test and tweak things locally and not on production.

Applying changes to production

For themes and custom plugins for sites where I only have FTP access, I’m using git-ftp that allows me to push to FTP server using git ftp push. It keeps track of which revision is on server and updates only the difference. It does mean that you never change things on server directly, but have to go through committing to git first (which I consider a good thing).

For environments with shell access you can just ssh and then use git on the other side to pull in changes. It works, but it’s a couple of additional steps.

Lately, I’ve automating these tasks  with Ansible playbooks that allow me to have simple scripts like:

---
- hosts: server1
  sudo: no
  tasks:
    - name: update theme
      git: repo=git@server:themename.git dest=/home/username/sitename/wp-content/themes/themename

or to grab database dump

---
- hosts: server
  tasks:
    - name: wp db dump
      command: /home/username/.wp-cli/bin/wp db dump /home/username/tmp/sitename.sql chdir=/home/username/sitename
    - name: copy db to ~/dbdumps/
      local_action: command scp servername:tmp/sitename.sql /home/username/dbdumps/sitename.sql
      sudo: no

Which can then be easily extended or in a separate playbook file drop local database and import new copy. To run these playbooks you would just use ansible-playbook dbdump.yml and similar and it gives you a full report of what’s happening.

For bigger and more complex setups you would extend to support rollback and different revision models, but that’s beyond scope of my current WordPress projects.

Observations

Scripting these tasks always seemed as something not worth doing as they were just a couple shell commands or clicks away. But as number of projects grew it became annoying and much harder to remember specifics of each server setup, passwords, phpmyadmin location and similar.

With having things fully scripted, I can now get a request from client, sync whatever state of their WordPress is at the moment, automatically in just a minute, and see why theme broke on a specific article. It saves me crazy amount of time.

At the moment I’m trying to script anything that I see myself typing into shell more than 3 times and so far it was worth it every time as these scripts suddenly become reusable across different projects.

]]>
http://www.jurecuhalev.com/blog/2014/01/11/wordpress-sync-production-with-dev/feed/ 2
PSI Directive (Directive on the re-use of public sector information) http://www.jurecuhalev.com/blog/2013/10/24/psi-directive-directive-on-the-re-use-of-public-sector-information/ http://www.jurecuhalev.com/blog/2013/10/24/psi-directive-directive-on-the-re-use-of-public-sector-information/#comments Thu, 24 Oct 2013 17:49:00 +0000 http://www.jurecuhalev.com/blog/?p=1600 Today I had a chance to visit LAPSI 2.0 project conference (The European Thematic Network on Legal Aspects of Public Sector Information). Wikipedia has a good definition of the directive:

Directive 2003/98/EC on the re-use of public sector information, otherwise known as the PSI Directive[2][3] is an EU directive that encourages EU member states to make as much public sector information available for re-use as possible.

Speakers dealt mostly with EU level policy discussion either on specifics of the directive or issues in this area in member states specifically. What follows is a few of my notes that I hope will help me remember things from the event in 2 years time. I haven’t read the directive yet and since I’m not a lawyer my conclusions are probably wrong.

Getting the data

Historically speaking, Slovenia has had one of the most progressive Freedom of Information Act’s, coupled with very proactive Office of Information Commissioner. This meant that filing an FOIA request to PSB (public sector body) was often the most efficient way to get access to data that they gather or produce.

While this works fairly well for some parts, it still has its own limits. PSI solves that by further encouraging PSB’s to make data and information openly available. It further makes it harder to charge and limit access by requiring institutions to explain why access is limited, together with business and cost calculations.

I can’t find the source in published texts, but part of the discussion also revolved about this applying to Libraries and Cultural Works. This will present both a challenge for existing archives as well as opportunities for new ways to disseminate this content.

What’s the point?

Economy. There is a huge body of work and case studies that show that once you open up this data to greater public it provides exponential return on investment through new services and uses for it. The less limits, the more potential can be realised.

For me, it’s often hard to see use for a lot of the data that we find online or it would require distortional development investment to make it useful. On the other hand, most of this data and content was already paid using public money, so EU is betting that just opening everything will have huge economic impact.

When?

“Soon”. The way I understood is that we’ll see implementations into local EU member states sometime in 2015. But because of the direction and the work going in this area it should be possible to already use arguments and approaches within existing laws and individual agreements with institutions.

Additional Resources

 

 

]]>
http://www.jurecuhalev.com/blog/2013/10/24/psi-directive-directive-on-the-re-use-of-public-sector-information/feed/ 0
Culture successfully raises funds from the EU http://www.jurecuhalev.com/blog/2013/10/23/culture-eu-funding/ http://www.jurecuhalev.com/blog/2013/10/23/culture-eu-funding/#comments Wed, 23 Oct 2013 09:52:30 +0000 http://www.jurecuhalev.com/blog/?p=1596 Today  Cultural Contact Point Slovenia based at SCCA-Ljubljana, Media Desk Slovenia, and Culture.si/Ljudmila Art and Science Laboratory released visualisations of last decade of successful fundraising data from EU programmes.

culture-eu-funding-interactive

kultura_klicaJ_TISK64x44_ENG_2013-10-15.indd

Process

This is first public release of such data in Slovenia and is result of 6 months of intensive work in data reconciliation, methodology and finally – visualisation.

I helped mostly in regards with data reconciliation and can speak about the tools we used. Basic tool was Google Spreadsheet and was used as a database that everyone could contribute to and it helped us sync the data together. It also allowed for basic pivot table based visualisations. It worked mostly ok and ability to write scripts for it also helped a lot. Finally the data was moved into Semantic media wiki and visualised using d3.js.

Lessons learned

  • Google Spreadsheets don’t scale. After you reach about 1000 rows with 30 columns, it becomes almost unusable slow.
  • This dataset is complex enough that it would benefit from automatic checks – automated reimporting into real database and basic reports – unique institution, basic pivot tables. This would help with encoding, whitespace issues that Spreadsheet doesn’t handle.
  • Google Spreadsheet got really good tools for pivot tables, but they’re a pain to manage if data ranges change. It can probably be further automated but I haven’t yet figured out how.
]]>
http://www.jurecuhalev.com/blog/2013/10/23/culture-eu-funding/feed/ 0
Notes on Open Data from OKCon 2013 http://www.jurecuhalev.com/blog/2013/09/20/okcon2013/ http://www.jurecuhalev.com/blog/2013/09/20/okcon2013/#comments Fri, 20 Sep 2013 14:22:29 +0000 http://www.jurecuhalev.com/blog/?p=1588 It’s popular today to work in Open Data, Big Data or similar space. It feels just like the times of Web 2.0 mashups, but instead of simple Google Maps based tools we’re now creating powerful visualisations that often feel like an end by itself.

In a way, I expected OKCon, OpenKnowledge Foundation’s conference, to be about makers – people that build useful part of OpenData ecosystem and to provide in-depth case studies. Instead we’ve got a mix of representatives from government, large international NGOs, small NGOs and various developers and software providers. To me these worlds felt just too far apart.

Data Portals

Data portals are here and while we haven’t seen a large scale deployment from government in Slovenia, it’s likely that they we will have something in this regard in the next few years. Everyone else has already deployed their first version and more advanced (public) institutions are already on their second or third attempt. Just as with social media a few years ago, we’re now also seeing first case studies that show economical and political advantage of providing such sites.

Self hosted CKAN platform seems a popular tool for such efforts (or it just might be conference bias as it’s developed by OKFN).

Budgets and contracts

A lot of effort is expanded in area of representing budgets, tenders, company ownerships and similar. In this regard, Slovenia’s Supervizor looks like something that’s from far future compared to what other countries or projects achieved at the moment. We could contribute a lot back to the international community, if we can produce case studies on benefits (or lack of) of such system.

At the same time, building visualisations around local budgets is something that doesn’t feel productive anymore. I think we should just upload sanitised data to a portal like OpenSpending and focus our efforts on projects with more impact.

Maps

Just as with mashups, everyone loves geospatial representation. The more colours and points of interest, the better. The only problem is, that it’s often useless for people that actually need to use it. While not openly expressed during presentations, discussions during the break often revolved around how bloated and useless were these representations and that just having a good text/table based report would work so much better.

At some point, community will have to embrace modern product development methodology – stories, user testing, iterative development and similar.  Right now it feels like a lot of these tools are either too generic or sub-contracted and developed through water-fall model.

Having said that, I’ve seen great examples of how to do things right: landmatrix.org

Tools

While NGOs might be building things the old fashioned way, their developers certainly aren’t. Tools and platforms are openly licensed and published on GitHub and often tied into different continuous integration environments.

  •  https://github.com/ostap/comp – automatically exposes your local CSV, XML and similar files as JSON endpoint through standalone Go based server. Developed by mingle.io team.
  • Drake – for building workflows around data
  • Pandas – python based data analysis

 

Conclusion

Software development is already hard for teams of seasoned veterans that work on projects inside the tech industry. It’s almost impossibly hard for both large and small NGOs since there just isn’t enough talent available. Additionally, it seems that these organisations often don’t want to coordinate efforts (even basic sharing of data) with each other or even internally, making projects even less likely to succeed.

I think that  we’ll continue to see a lot of badly executed projects in this area until modern, tech-driven groups like OKFN and Sunlight Foundation manage to raise the bar.

]]>
http://www.jurecuhalev.com/blog/2013/09/20/okcon2013/feed/ 0
Visualizing Slovenian coalition agreement http://www.jurecuhalev.com/blog/2012/01/28/visualizing-slovenian-coalition-agreement/ http://www.jurecuhalev.com/blog/2012/01/28/visualizing-slovenian-coalition-agreement/#comments Sat, 28 Jan 2012 22:22:19 +0000 http://www.jurecuhalev.com/blog/?p=1566 With the election of new Slovenian prime minister we also got formal release of a Coalition agreement. Since it’s a 72 page document, I was wondering what keywords would stand out. Here is the result:

Pogodba za Slovenijo 2012 - 2015 - word cloud (top 80 words)

While we’re at it, we can also take a look at the coalition agreement that Pozitivna Slovenija prepared. As we run them through the same process, we get:

Koalicijska pogodba - Pozitivna Slovenija - 2012 (top 80 words)

 

A few words on how to reproduce this:

  • Grab your favorite OCR software and convert scanned PDF into .docx
  • From Word save it into .txt file
  • Lemmatize the words so you normalize all the grammar rules
  • Apply stop-words (in this case mostly: ministrstvo*, vlada, slovenija*, ..)
  • Drop the resulting text into wordle.net

 

]]>
http://www.jurecuhalev.com/blog/2012/01/28/visualizing-slovenian-coalition-agreement/feed/ 3