When it comes to web analysis tools, Google Analytics and Adobe Analytics are said to be the two strongest, and Google Analytics is used on many sites, especially due to the existence of a free version of the service.
However, there are many other web analytics tools out there. Here, I would like to take up a web analysis tool developed as open source. Also, let’s take a look at the techniques used in the source code of the open source analysis tool.
Benefits of Open Source Web Analytics Tools
Of course, from the standpoint of web analysis, there are few areas that are superior to Google Analytics and Adobe Analytics, but the good thing about open source is that the source code is open to the public, so you can freely see the contents and modify the source code. can be used as
Especially when using Google Analytics, I sometimes think, “I want to do something about this kind of specification here.” In such a case, an open source web analysis tool can handle it (although it requires a technology that can modify the source code).
Also, since the source code can be viewed freely, it is also a great advantage to be able to sneak a peek at the detailed techniques in the source code.
Examples of open source web analytics tools
matomo
I have seen several open source web analysis tools, and I can say that “matomo” is the most user-friendly open source product. The interface is easy to understand, and many of the things you can do with Google Analytics are also available in matomo, probably because you are quite aware of Google Analytics in terms of functionality.
In particular, it is wonderful that the integration with Google Ads (viewing ad metrics with matomo) has already been implemented. In addition, because it is an independent open source, not only integration with Google Ads, but also integration with Bing and Facebook Ads, integration with Bing Search Console, and other products other than Google products have been implemented. I’m surprised there are.
If you want to switch from Google Analytics or Adobe Analytics to an open source web analysis tool, it will be the first tool that comes up as an option (although there are few such switching needs).
Goat Counter
GoatCounter is a tool that analyzes access without using cookies to identify users. Google Analytics tracks various information, so it is possible to perform more detailed access analysis. On the other hand, GoatCounter is characterized by giving due consideration to the user’s own privacy and keeping it to the minimum necessary tracking.
In terms of UI and functionality, it is inferior to Google Analytics, but considering GDPR in Europe and CCPA in California, USA, there may be a view that this level of functionality is sufficient.
What we learned from open source
Exclude bots from measurement
Google Analytics also has a bot traffic exclusion feature. This bot traffic exclusion function is based on the user agent criteria based on the “International Spiders and Bots List” provided by the IAB (Interactive Advertising Bureau; Internet advertising industry group). However, “International Spiders and Bots List” requires a very expensive paid license, so even if you want to use it for measurement tools other than Google Analytics, you can’t do what you want.
In “GoatCounter”, Github’s source code (zgoat/goatcounter/public/count.js) uses global variables specific to headless browsers such as “PhantomJS”, “Nightmare”, “WebDriver”, and “Selenium” as conditions. It seems to be doing bot judgment.
In particular,
// backend, but these properties can't be fetched from there.
var is_bot = function() {
// Headless browsers are probably a bot.
if (w.callPhantom || w._phantom || w.phantom)
return 150
if (w.__nightmare)
return 151
if (d.__selenium_unwrapped || d.__webdriver_evaluate || d.__driver_evaluate)
return 152
if (navigator.webdriver)
return 153
return 0
}
It uses source code like
If you’re working with a headless browser programmatically, it’s easy to change the user agent, causing a misclassification. Using global variables unique to headless browsers, like the source code above, seems to minimize such omissions.
If you are running a website with a lot of bot inflows, it is a good idea to use Google Tag Manager to create a custom JavaScript variable like the one above, perform bot judgment, and then execute various tags. prize.
URL, referrer canonicalization
The source code of matomo analytics includes the following source code (adjust line breaks, number of spaces, etc.).
/*
* Fix-up URL when page rendered from search engine cache or translated page
*/
function urlFixup(hostName, href, referrer) {
if (!hostName) {
hostName = '';
}
if (!href) {
href = '';
}
if (hostName === 'translate.googleusercontent.com') { // Google
if (referrer === '') {
referrer = href;
}
href = getUrlParameter(href, 'u');
hostName = getHostName(href);
} else if (hostName === 'cc.bingj.com' || // Bing
hostName === 'webcache.googleusercontent.com' || // Google
hostName.slice(0, 5) === '74.6.') { // Yahoo (via Inktomi 74.6.0.0/16)
href = documentAlias.links[0].href;
hostName = getHostName(href);
}
return [hostName, href, referrer];
}
In this source code, processing is performed to correct the URL and referrer when displayed as a search engine cache or as an automatic translation page. For example, when displayed as Google’s automatic translation page, if there is no referrer, set your own page (automatic translation URL) as a referrer, or when displayed as a cache of Bing, Google, Yahoo, etc. now retrieves the original page URL instead of the cached URL for the current page URL.
summary
There are many other things that can be learned from the source code of open source web analysis tools other than those introduced here. In particular, the identification of the first bot introduced this time is eye-opening. If your site provides a lot of data and you think that you are receiving a lot of bot access from third-party developers, implement the above logic as a variable in Google Tag Manager etc. , Google Analytics, etc.