Sunday, May 19, 2024
HomeTechnology How to reduce query costs during development?

[GA4] How to reduce query costs during development?

One of the big advantages of Google Analytics 4 is that you can use BigQuery even in the free version.

However, as there is a keyword “cloud bankruptcy”, if you use it incorrectly, you will be charged a large amount of usage fee.

thumbnail 3

In this article, I would like to introduce how to estimate the BigQuery fee in GA4 and how to avoid a large bill.


BigQuery Pricing Calculator

BigQuery’s pricing structure is very complicated, but basically you can think of it as having two types of costs: “storage cost” and “query cost” (strictly speaking, storage costs are for active storage and long-term storage). Unit prices change, and there are on-demand and flat-rate plans for query costs.)

Storage cost is literally the cost charged according to the size of data held in BigQuery. There is a free tier of 10GB of storage per month, and anything over 10GB will be charged at a cost of $0.01-$0.02 per GB. The challenge with estimating storage costs is the ever-increasing amount of data managed by BigQuery. Depending on the site traffic, GA4 data for one day may exceed 1GB. Therefore, the focus is on when to estimate the cost .

Query cost is the cost charged according to the data size read by the SQL query executed by BigQuery. Even if you have a lot of data stored in BigQuery, query costs can be kept low if your SQL queries only read a small amount of data. There is a free tier up to 1TB per month, with $5 per TB charged for read data sizes over 1TB. The difficulty in estimating query cost is that it cannot be estimated without implementing the query, and it depends on how often that query is executed .

How to estimate BigQuery pricing in GA4

Query cost is difficult to estimate because it depends on the SQL query to be implemented, but storage cost is relatively easy to estimate.

Storage costs are charged according to the size of data stored in BigQuery as described above. In the case of GA4 data, the data size is basically consumed in proportion to the number of events accumulated in GA4.

When I checked the data exported to BigQuery at the GA4 property that SEM Technology is involved with, it consumed 6-7MB of data size per 10,000 events .

Using this number, for example, if your site has 5 million events per month, it will be 500 x 6-7MB, and your monthly storage size (increment) will be estimated at 3-3.5GB.

After one year, one year’s worth of data will be accumulated, so for the time being, in order to calculate the estimate after one year, you should assume that you will consume about 36 to 40 GB of storage. . Up to 10GB is free, and anything beyond that costs $0.01 to $0.02 per GB, so assuming 40GB, it will cost about $0.3 to $0.6/month.

How to avoid large bills

So far, we’ve looked at BigQuery pricing and estimated storage costs for GA4. Let’s take a look at what you can do to avoid big bills with BigQuery. Storage costs cannot be saved unless the amount of data is reduced, so the discussion below is about how to reduce query costs.

Let’s see.

Reduce the table/period to be read

Reducing the loading period is the first thing to consider. For daily monitoring, it is sufficient to target data for the past 30 to 90 days (or include data for the same month of the previous year for comparison with the previous year). If you do not narrow down the table and period to be read, the retention period of the target data may be short at the beginning of use, and the query cost may be low, but the query cost will gradually increase.

In BigQuery’s SQL,

SELECT
  *
FROM
  `project.analytics_0123456789.events_*`
WHERE
  _TABLE_PREFIX BETWEEN '20201001' AND '20201231'

As shown, the date part of the table name to be read is “*” and the WHERE clause is used to narrow down the value of TABLE PREFIX . TABLE PREFIX reflects the part specified as “*” in the table name as it is.

If you don’t filter by TABLE PREFIX, all table data will be read.

Also, regarding the period narrowed down by TABLE PREFIX, the sample query above uses a fixed period, but in actual operation, you can use the CURRENT DATE() function and the DATE SUB() function to narrow down the past three months. You will have to

Fewer table columns to load

You can reduce the query cost of BigQuery by reducing the number of “read columns” specified in the SELECT clause. GA4 BigQuery has over 100 columns in total. It is unlikely that you will ever need all of these 100+ columns. In fact, a typical analysis would require less than half of these columns.

Specifying dozens of columns in the SELECT clause can be cumbersome, but it’s a good idea to explicitly specify the columns, especially when implementing production queries.

Also, as shown in the screen below, if the standard GA4 BigQuery is used as the data source from Data Studio without using custom SQL, all columns will be read regardless of what columns are used in the dashboard. you have to be careful.

Use of sampling table

Actually, this article was written just to introduce this method. The method presented here is the one that is primarily expected to be used when developing custom SQL. When we need to develop complex custom SQL, we often run SQL queries for implementation and testing. In some cases, the query is executed nearly 100 times. Also, in the case of use during development, reducing the number of columns in the table to be read may reduce development efficiency, and it is often not realistic, and one of the methods of reducing query costs will soon cease to exist.

So I’ll show you how to prepare a sampled table of the original table for custom SQL development.

The idea is that if the original data was 1 million rows per day, it is very unlikely that you would need to use all 1 million rows of data to develop custom SQL. Therefore, if you sample this table at 1% and save about 10,000 rows of data as a completely separate table and develop custom SQL using this sampled table, you can test with the original 1% data. , can reduce the query cost by a factor of 100.

The specific method is introduced in the following Youtube video, so if you are interested, please take a look.

summary

This time, I explained the pricing aspect that many people are concerned about for BigQuery integration, which is one of the main features of GA4. I’ve written a lot about it, but BigQuery’s fees are basically set cheaply, so unless you have a very large site, if you consider a monthly budget of 10,000 to 20,000 yen, I don’t think you’ll exceed it ( Rather, there are many cases where the budget is set at 10,000 to 20,000 yen, but when you start using it, it will be less than 1,000 yen.)

By all means, if you are just starting to use GA4, let’s get your hands on the main feature, BigQuery integration, and make use of it.

RELATED ARTICLES

Leave a reply

Please enter your comment!
Please enter your name here

Recent Posts

Most Popular

Recent Comments