Category Archives: Tech

Average Play Time duration in Google Analytics

I’ve recently had a challenge where I have a website that has video and audio content. I wanted to log events into Google Analytics in a way that would allow me for each media to have the following stats:

  • Average playtime
  • Distribution of playtime
  • Total playtime


I was greatly inspired by this article on Video Tracking (The Right Way) for Google Analytics as it’s displaying a vision of a good dashboard for a multimedia heavy web property. The main issue is that it was never clear to me how much of that is aspirational and how much it’s possible to implement.

I recently found Enhanced Google Analytics Tracking for Video Publishers that describes this process in more detail. It describes two different approaches: heartbeat and milestone approach.

The heartbeat approach is where the client (web site) sends an event every X minutes to indicate that the user is still watching our material.

Milestone approach is where we have split our video in distributed chunks such as every 10% and send the event back to Analytics when a user reaches that chunk.

Both milestone and heartbeat approach have an issue that you can potentially lose the last chunk of information if the user navigates away from the web page and you send that last event for one of many reasons (see below). This also means that you can’t send a specific total view time at the end of viewing sessions as you can’t be sure that you’ll have a chance to transmit the information.

Problem: It’s hard or impossible to send accurate events from mobile devices

It would be nice if we could send an event when the user navigates away from our website. It turns out that it’s not something you can reliably do. There are multiple reasons:

  • unload() event is so unreliable that MDN says to not even bother using it on mobile. With more than 50% of traffic being on mobile devices these days this carries a serious measurement error.
  • Mobile devices allow playing audio and sometimes video on the lock screen or in Picture in Picture mode. In such cases, you sometimes don’t get any javascript calls back to the website and you can’t log events to your analytics.

Problem: Google Analytics is append-only with very limited statistics

In a perfect world, we’d send heartbeat events for a specific user’s video session and just append the last total view time to that heartbeat event. Then in the next step, we could filter out these sessions in a way that would only read the last heartbeat value and not the ones that came before. I couldn’t figure out how to do this with a free version of Google Analytics and I have a hunch it’s not possible. I think that just the sheer complexity of such calculation is outside of the scope of Google Analytics.

This means that we can’t get accurate numbers and that the best thing we can do is go with milestone approximation.

How does it look in practice?

I’ve implemented two approaches: 10% segments and 5-minute chunks. This way I can measure the overall completion rate of videos across the site and at the same time get a feeling for the number of time users are viewing the content.

These are two views of the same video recording. As you can see it has a good completion rate of about 50% people getting to the end and most of them getting to the approximate total time of the video (44 minutes). Some of them even went back and rewatched some parts!

One interesting side effect of adding these analytics to the system is that our average session duration went up drastically. I think it’s easy to under-measure things if you have a website that isn’t part of the normal e-commerce or marketing funnel style page.

What I wish I could do better inside Google Analytics

If you look at the screenshots closely you can see that report is showing individual ratio percent of each event action. What I’d like to have is a way to link these events together into series so I could indicate funnels much better.

Building these ‘per video’ reports requires a lot of clicking each time we publish a new video. It would be good to have a way to automatically generate a report for each in Google Analytics.

Get better averages and statistics around Event Values.

Lessons learned and looking forward

Implemented is always better than perfect. Even with such a rough measurement approach, we’ve started seeing user patterns that were hidden from us before.

A free version of Google Analytics is amazing but at some point, we’ll have to look into implementing a different technology. It’s probably going to be supplemental and understanding the limits of the existing one will be invaluable in looking at use cases for whatever we choose.

In the future, I want to look into Matomo with the Media Analytics plugin. It seems that it offers some of the more advanced functionality.

Gathering data is an easy part. Now the bigger challenge is how to distill this into a format that offers actional insights for the content team.

Conditional Cache Mixin for Django DRF

For a project I’m doing I’m looking at adding a conditional header to bypass cache when an X-No-Cache is present. In my case this allows external system to flush cache when certain conditions are met.

I’ve modified code from Django Rest Framework Extension to allow for such behaviour. There might be a better way to do it, but at the moment the flow of the code is clear to me. It also needs drf-extensions as it’s just an additional mixin that offloads the code to cache_response decorator.

from rest_framework_extensions.cache.decorators import cache_response
from rest_framework_extensions.settings import extensions_api_settings

class BaseCacheResponseMixin(object):
    object_cache_key_func = extensions_api_settings.DEFAULT_OBJECT_CACHE_KEY_FUNC
    list_cache_key_func = extensions_api_settings.DEFAULT_LIST_CACHE_KEY_FUNC

class ConditionalListCacheResponseMixin(BaseCacheResponseMixin):
    def _cached_list(self, request, *args, **kwargs):
        return super().list(request, *args, **kwargs)

    def list(self, request, *args, **kwargs):
        if request.META.get("HTTP_X_NO_CACHE") == "1":
            return super().list(request, *args, **kwargs)
            return self._cached_list(request, *args, **kwargs)

class ConditionalRetrieveCacheResponseMixin(BaseCacheResponseMixin):
    def _cached_retrieve(self, request, *args, **kwargs):
        return super().retrieve(request, *args, **kwargs)

    def retrieve(self, request, *args, **kwargs):
        if request.META.get("HTTP_X_NO_CACHE") == "1":
            return super().retrieve(request, *args, **kwargs)
            return self._cached_retrieve(request, *args, **kwargs)

class ConditionalCacheResponseMixin(
    ConditionalRetrieveCacheResponseMixin, ConditionalListCacheResponseMixin

Automated form testing in WordPress

I’m developing a Gravity Forms based workflow that is initially handled by a long form. Once the form gets submitted, it creates a custom post in backend so that editorial team can review it.

Throughout the years, I’ve discovered that the most annoying and time consuming part of such development is filling form every time you make a change. This is especially important as there is a number of behind the scenes actions that trigger only on form submit. I’ve decided to automate it this time.

My approach is using two major components:

Step 1: Record steps

I’ve used Cypress Recorder to fill the form and got a result that looked something like this:

describe('Event Form Submission', () => {
  it('Fills in a form', () => {
    cy.get('#input_2_4').type('Example Event Title');
    cy.get('#input_2_5').type('Example Event Description');
    cy.get('#input_2_6').click({force: true});
    cy.get('#input_2_6').type('June 20 - 25, 2022');
    cy.get('#input_2_16').click({force: true});
    cy.get('#input_2_7').click({force: true});
    cy.get('#input_2_7').type('Frankfurt, Germany');
    cy.get('#input_2_8').click({force: true});
    cy.get('#input_2_8').type('Test Research Limited');

    // .. more of similar to above ..

    cy.get('#gform_submit_button_2').click({force: true});
    cy.url().should('contains', 'http://example.test/event-submit-test/');

Step 2: Configure Cypress and install add-ons

By default Recorded doesn’t handle scrolling the window and that makes Cypress unhappy. So I had to add {force: true} to some click() calls to move the window down.

My form is also working with file uploads and has a TinyMCE input textarea, so I had to uninstall two add-ons: cypress-file-upload and @foreachbe/cypress-tinymce. They’re both very simple to use:

// file upload

// tinyMCE modification
cy.setTinyMceContent('input_2_11', 'This is the new content');

Step 3: Enjoy automated form filling while developing backend code

Lessons learned

I originally asked my question on Facebook in a local developers group. I got some good tips that lead me down this path. Instead of trying to do something with Chrome extensions, I finally took the time to learn End to End testing framework in a non-React environment. My previous experience with such tools was Selenium and I’m happy to see that the whole experience was much better this time.

Taking time to setup my dev workflow saved me many hours and I’m happy that I didn’t blindly start with development.

How I sped up WordPress 4x on mobile

With new Chrome, we got even more developer tools. Newest version has a feature “Capture screenshots”. It will record your page load and display how it looks as it’s downloaded to your browser.

After watching Paul Irish comment on some of the large media sites, I started wondering – how is Val 202 doing? It’s getting a decent amount of traffic on mobile devices. It also has a large amount of traffic through Facebook and Twitter, meaning that they probably don’t have our assets cached.

First test – 3G, no-cache: 9.34 seconds until title is displayed!

slow-1Ok, that’s clearly bad. It also means that they potential readers will probably abandon page load and go watch kittens that load from a faster domain.

Looking through all the assets that our theme loads, I see a bunch of potential problems:

  • Plugins that we’re starting to deprecated, but they still load resources
  • External assets and iframes that we don’t even display on mobile
  • Disqus for comments, that we could hide
  • Images that are lower on site, and we could lazy load
  • We load all our assets from our domain, so we hit the limits of how many resources browser downloads in parallel

I try to disable as much of above functions, just to see if it’s worth of development time. Here’s my second measurement (3G, no-cache): 5.65 seconds until content appears.


I got the page to load in about half the time. Better, but not good enough.

As I cut even more things, I try to disable TypeKit and with it, time to content falls dramatically. Aha!

Reading TypeKits’ documentation reveals that it waits 3 seconds by default, to ensure that fonts load and there is not flash of unstyled content. But on mobile, we could decide that we’re ok with the flash as long as we show our reader content as soon as possible.

Third measurement (3G, no-cache), with async TypeKit: 2.5 seconds until content appears


Still not the best, but it’s a 4x faster than current version.

For now, I’ll try to load TypeKit in Async mode for devices that have smaller window width:

While this approach is not optimal, it gives us a quick win, while we work on streamlining the rest of the frontend code.


WordPress is great for quickly iterating and running content experiments. The problem is supporting this in the long run.

It’s also easy to forget elements of previous experiments in code – custom fonts, icons and whole scripts. It’s good to take a step back and reevaluate our code in terms of new usage patterns and best practices.

WordPress Analytics Engine


I think I’m obsessed with numbers. They give me a feeling of control. Page views, trends, visitor count and more. Not measuring things, makes me sad. If we have a historical data, we can check if our changes worked. Are certain topics more popular? Which stories are more popular on Twitter compared to Facebook? Infinite amount of questions.

I’d like to build a better analytics engine for us. I’ll explain my constraints, and how I’d approach it. Primary plan is that someone can say – “just use X”. If that fails, I can still build it.

Problem definition

We have multiple WordPress installations with overlapping authors. They create blog posts, that are shared to Facebook and Twitter. Each post can include multiple embeds, that they produce – Sound Cloud, YouTube and similar. For each blog, we can also lookup into Google Analytics API and get stats on sessions, page views and time on site.

There are two primary limitations of these external data sources. Firstly, we’re rate limited – so we can only query them about once a day – per post URL. Secondly, we mostly get aggregated data.

We would query these external API’s about once a day. The only limitation we have is that we get aggregate/sum data from them. Facebook only gives us total number of likes, so we need to make subtract previous value. This way we get number of likes in that day.


Having all the information in one place, it would allow us a couple of things:

  • Weekly reports for authors – sending them encouragements on how their stories did
  • Information for content editors, what got most attention that week
  • Identify old content that suddenly got interest
  • Get information on success of embedded content (Sound Cloud, YouTube)
  • Develop customised indicators – authors with most viewed YouTube videos

Potential Solution?

When researching this topic, there is a software stack that almost fits. It’s LogStash with Kibana. LogStash provides data storage and logging capabilities. Kibana support display of data in many different ways.

The other approach would be to just code it any web framework. But it seems like a huge duplication of work.

Technical Questions

Would ELK stack work? Can LogStash provide input filter that will automatically normalise data for me? It is the right technology stack at all?

Is there anything that solves this in a much better way?

Content Questions

Is it worth building this at all? Is it a good idea to attach numbers to (journalists) work? Did I miss any questions that would be worth exploring?

Would you use such a service?