XML, LOL

Notes from csv,conf 2014

Today I attended csv,conf in Berlin, which turned out to be an excellent conference full of people who gather and transform data on a daily basis.

CSV, comma separated values, file format seems like a joke at first – who seriously uses that today in age of SQL, no-SQL and other $random-DB solution? It turns out that almost everybody at some point – either as input or as data interchange format in cases where systems are not part of your organisation.

Fail quickly and cheaply

A few different people presented their solution for “testing” CSV files, which might be better describe as making sure they conform to a certain schema. They range from just simple checks to full-fledged DSL that allows you to specify rules and even do checksums against referenced files.

The reason I liked most for this is that it allows you to very quickly verify sanity of files that you received and allows you to quickly give feedback to the other party that is sending you these files. This ensure that some time later either in time or inside your pipeline you don’t have to deal with bad data.

Embrace the UNIX philosophy – do one thing at the time

Most of the speakers also mentioned that in order to keep your sanity, you should build your system as a collection of small dedicated tools that pipe into other dedicated tools. It doesn’t necessarily have to be Unix pipe, but more a collection of steps that convert data into another step/database and then as next step does the processing again.

Everybody has the same problems

I think the biggest takeaway for me was, that we’re all having the same issues. We all get messy datasets that are hard to parse and are full of strange errors and inconsistencies.

As with other things, there is no silver bullet. We’ll have to build and teach best practices around data – testing, cleaning and what works and what doesn’t. Just we’re doing it in terms of modern software development.

Interesting tools and libraries

Impressions from FITC Amsterdam 2014

This week I had a pleasure of attending Future, Innovation, Technology and Creativity (FITC) 2014 Conference in Amsterdam. I wanted to update myself on what the creative industries are doing as I was mostly visiting developer and security oriented events.

In general, quality of talks and presenters greatly exceeded my expectations and I feel myself lucky that I managed to visit the event. While I’ll try to make a few more blog posts from specific presentations, here are a few of general observations.

More about “Generating Utopia” project.

Open Source DIY technologies are not for geeks anymore

It seems that there are certain waves of technologies that are first picked up by open source hackers and the world doesn’t fully understand at the time. Most of the projects included things that are casually talked at CCC events and hackerspaces: 3D printers and 3D scanners using Kinect, Arduino based DIY controllers, low tech prototypes with LED’s and smartphones or just OpenStreetMap with commons Mashup API’s and Processing.

I think what these platforms have in common is that they’re much more easily available to creative people and huge amount of information that’s available online makes it incredibly easy to use. With that experiments often grow into high quality commercial grade works. This gets noticed by commercial clients and suddenly your next project is produced by using the same material and techniques.

Amount of required knowledge and insight is insane ..

Keeping in mind things from the previous paragraph, it seems that today it’s not enough to just know how to do art composition you’re also required to know enough coding to do processing mockups, generative audio with help of OpenFrameworks and final touch in form of Final cut and interactive web applications.

You don’t have to be an expert in all of these things but you actually have to have basic knowledge in order to actually know how to ask for help for all the tools that you’re suddenly using.

.. and it’s expanding

Quite a few presenters were already showing experiments and initial thoughts that were done using Oculus Rift, massive crowdsourcing apps, new Xbox One Kinect and voice driven interfaces.

Things that would be unavailable to most artists 10 years ago are now accessible in form of easy to use kits for 100 USD or less and it’s even cheaper with development environments.

Opportunities are everywhere

For generation of developers and tinkerers, that grew up with trying to get Linux to work on random unsupported hardware, this presents so many great opportunities. At this point everyone is thinking about open hardware and software, proprietary solutions and services can’t compete with Github development model anymore and because of Kickstarter and global economies of scale – you just have to pay for the production costs.

I think 2014 is really a year of open everything and if you’re working in environment that spreads these kinds of ideas and tools, you don’t have to do much to get people to listen to you. You just have to show up and present and teach a workshop. It’s that easy :)

Results of WordPress Ninja Forms entries as JSON

Ninja Forms is rather nifty WordPress Plugin for forms. The main problem I have with it at the moment is that’s a rather mess in terms of data structure and getting data out of it. One can grab CSV file, but doesn’t really help you if you want to make a nice front-end.

So here’s a snippet that will dump your current form results in a way that you can further display them with AngularJS or similar.

How to organise and synchronise production WordPress with local development environment

In the last year I’ve either deployed or inherited about 10 new WordPress installations and managing them became a mess that quickly ate too much of my time. It seems that quite a few of my friends have the same problem – so here’s a quick overview on how to approach it.

Everything I describe here can definitely work on OS X or Linux and probably on Windows as they’re all either PHP or Python based tools.

Keeping up with updates

Clients don’t update their plugins or WordPress itself and when they do they won’t read changes clearly enough to be able to judge if upgrade would break something. I use InfiniteWP for this. It’s a standalone PHP installation that connects to your WP’s via InfiniteWP Client plugin. It’s free, with some commercial add-ons. You can set it up to email you when there are new updates and support remote backups of your sites, which will be useful for later stages.

From security standpoint, it’s definitely not optional, but at the moment – not updating seems a greater risk.

infinitewp

Local development environment

For each client’s site, I would have a local copy running on my computer. Depending on your preferences you might be using something like MAMP of XAMPP that packages MySQL, PHP and Apache server together. One thing to watch out is that you’re running your local development under the same major version of PHP as it’s often source of bugs (as my local PHP would support newer syntax than the one on server).

For each site, I would have a local alias – http://sitename.local/ to ensure that I don’t accidentally change things on production.

For things I would develop, usually a theme and an extra plugin, I would store them in git to keep revision history and feature branches.

I have yet to find a good way to version plugins, so for now the tactic is to try to keep up with latest versions of plugin and use them as little as possible and only from developers that have release blogs and sane release tactics.

Synchronising production to local environment (manually)

Sometimes I don’t have shell access to server – in that case I would use either InfiniteWP to generate a database dump (from InifniteWP dashboard) or UpdraftPlus from within WordPress dashboard.

Locally, I would then use wp-cli to reset local database:
wp db reset
and import new database:
wp db import sitename_db.sql

wp-cli supports local path substitutions, but it’s usually not needed. What I would do is modify my local wp-config.php to have:

define('WP_HOME','http://sitename.local/');
define('WP_SITEURL','http://sitename.local/');

This allows me to use copy of production database, without WordPress redirecting my logins to production URL.

For contents of wp-content/uploads I usually don’t bother as I can easily fix things without seeing images in last few blog posts.

Synchronising production to local environment (automated)

For the sites where I have shell access and can install wp-cli on server, I have ansible scripts (more on that later) that run:
wp db dump
locally and then copy it to my dev environment where they import it using wp db reset and wp db import combination.

This means that I can sync production to my local environment in less than a minute, making it a no brainer to test and tweak things locally and not on production.

Applying changes to production

For themes and custom plugins for sites where I only have FTP access, I’m using git-ftp that allows me to push to FTP server using git ftp push. It keeps track of which revision is on server and updates only the difference. It does mean that you never change things on server directly, but have to go through committing to git first (which I consider a good thing).

For environments with shell access you can just ssh and then use git on the other side to pull in changes. It works, but it’s a couple of additional steps.

Lately, I’ve automating these tasks  with Ansible playbooks that allow me to have simple scripts like:

---
- hosts: server1
  sudo: no
  tasks:
    - name: update theme
      git: repo=git@server:themename.git dest=/home/username/sitename/wp-content/themes/themename

or to grab database dump

---
- hosts: server
  tasks:
    - name: wp db dump
      command: /home/username/.wp-cli/bin/wp db dump /home/username/tmp/sitename.sql chdir=/home/username/sitename
    - name: copy db to ~/dbdumps/
      local_action: command scp servername:tmp/sitename.sql /home/username/dbdumps/sitename.sql
      sudo: no

Which can then be easily extended or in a separate playbook file drop local database and import new copy. To run these playbooks you would just use ansible-playbook dbdump.yml and similar and it gives you a full report of what’s happening.

For bigger and more complex setups you would extend to support rollback and different revision models, but that’s beyond scope of my current WordPress projects.

Observations

Scripting these tasks always seemed as something not worth doing as they were just a couple shell commands or clicks away. But as number of projects grew it became annoying and much harder to remember specifics of each server setup, passwords, phpmyadmin location and similar.

With having things fully scripted, I can now get a request from client, sync whatever state of their WordPress is at the moment, automatically in just a minute, and see why theme broke on a specific article. It saves me crazy amount of time.

At the moment I’m trying to script anything that I see myself typing into shell more than 3 times and so far it was worth it every time as these scripts suddenly become reusable across different projects.