Home Technology Technical Terminology for Marketers! What is a “Hash”?

Technical Terminology for Marketers! What is a “Hash”?

by Yasir Aslam
0 comment

I often see cases where people involved in web marketing are not familiar with the technical field and use technical terms with a vague understanding. In this article, we have summarized what you need to know about “hash” (hashing, hash value), one of those technical terms.

thumbnail 8 1

“Hash” is a completely different term from the “hashtag” that comes up on Twitter. If you think that you don’t have a proper understanding of “hash”, please read it and understand it.


A rough understanding of hashes

Before understanding hashing strictly, let’s first have a rough understanding of what a hash is. Roughly speaking, a hash is

Obtaining from original data according to a certain calculation method

refers to And the method of obtaining this character string is called “hash function” or “hash algorithm”. The obtained character string is called a “hash value”.

For example, if the string ” https://sem-technology.info/ ” is hashed using the hash algorithm “SHA-256”,

78968D9F3313D32EA206C1317883E98D9CABEC0063F5CD3B743C0DC22696CE53

will be a string. This results in a seemingly random string of alphanumeric characters (which are actually derived from some calculation).

It’s too complicated to explain how the SHA-256 hashing algorithm is calculated in this article, so let’s consider a simple hashing function. for example,

the number of ○ in the string

is also (albeit of zero utility) a hash function.

First, let’s look at the concrete calculation results of this hash function.

original string memo hash value
SEM Technology one each for eoog Four
Google one for each of the ooge Four
Facebook one for each aeboo Five
Japan two a day 2

Calculations are easier to understand. From now on, we will continue the explanation using this hash function.

Relationship between original string and hash value

The original string and the generated hash value have the following relationship.

The same string gives the same result

As a matter of course, the hash value obtained from both the hash algorithm and the original string will be exactly the same regardless of who performs the calculation . I think it’s easier to understand if you use the above “number of circles in a character string” as an example. No matter what the string is, the calculation method is obvious, and the result is the same no matter how anyone calculates it.

When you look at the actual hash value, it looks like random character strings are lined up, but it is not a completely random value that changes each time it is calculated, but a value that follows a certain calculation method (algorithm). It is

Irreversible transformation

The most important point in the relationship between the original character string and the hash value is ” irreversible conversion “. It is possible to convert any string to a hash value, but the original string cannot be derived from the hash value. The reason why hash is used in web marketing is the property of this “irreversible conversion”.

In fact, it’s easier to understand if you think of a hash function of “the number of circles in a string”. Even if you know the hash value is “4”, you can’t tell from this “4” what the original string was. This is the important property of hashes.

hash value collision

As is already clear from the example of the hash function “the number of circles in a string”, even if the original string is completely different, the hash value may be the same. This is called a hash value collision. In the example of “the number of circles in a character string”, hash value collision occurred easily, but this is due to the poor performance of the hash function. Commonly used hash functions do not have hash values ​​that collide so easily.

Examples of typical hash functions (hash algorithms)

A typical hash algorithm is

  • MD5
  • SHA-1
  • SHA-256

And so on. A long time ago, “MD5” was often used, but methods to obtain the same hash value even with different character strings have been researched, and MD5 is rarely used from the viewpoint of vulnerability. . SHA-1 is also touted as vulnerable.

If you need the hash value in some situation in the future, be careful if MD5 or SHA-1 is specified as the hash algorithm.

Usage of hash

Web marketing use

For web marketing applications, website visitors and CRM customer data are hashed using a predetermined hashing algorithm before being sent to vendors. The vendor hashes all the data in their own customer database and searches for data that matches the received data.

The important thing at this point is

  • Since the data itself is hashed, the risk of personal information leakage is low even if this hashed data is leaked.
  • It is not possible to restore the original value from the hash value, but if the business operator and the vendor have common data, the data can be shared between the two parties.

It is two points. For Google Ads,

  • Create an audience with Customer Match
  • Conversion measurement by “extended conversion”

is used in The diagram below shows how hashing works in Customer Match.

hash

Also, many web analytics tools like Google Analytics prohibit sending “personal information” as custom data by their terms. Even in such a case, if the data is properly hashed, it cannot be restored to the original character string, so it can be interpreted that it does not correspond to personal information.

For example, by sending this hashed personal information to Google Analytics at the time of inquiry on the website, and by later matching the personal information of the closed data on CRM, the online and offline data can be integrated. It is also possible to build a mechanism

However, there is a good chance that people will have different views on legal interpretations, such as whether or not hashed data of personal information is personal information. Please be careful when doing so, as it may not be compatible with the existing privacy policy.

other uses

In areas other than web marketing, hashes are often used for “password management” and “file identity assurance”. Especially for password management, hashing is used in most cases.

The “password” entered when registering for some membership service is not registered in the database of the service provider as it is. Only the hash value obtained by using some hashing algorithm of the entered password is stored in the database. When a member logs in, a hash value is obtained from the entered password, and the password is verified by checking whether the hash value matches the hash value on the database.

When you forget your password, there are almost no services that will “send you the saved password”, and most services will invalidate the existing password and set a new password. this is,

  • The original password itself is not stored in the database
  • The original string cannot be computed from the hashed string

This is due to the characteristics of Conversely, if you forget your password, a service that sends you your original password is a service with a high risk of password leaks.

To calculate the hash value

Currently, SHA-256, a commonly used hashing algorithm, is a very complicated algorithm that cannot be introduced in this blog. For how to calculate SHA-256 in JavaScript, please refer to Qiita: SHA-256 implementation .

However, it will be quite complicated, so only those who are technically confident should implement it themselves. If you want to use it as an audience list for Google Ads Customer Match, etc., use the SEM Insight blog post ” Hash email address (SHA256 without salt) macro for Google AdWords Customer Match ” and use an Excel macro. (I haven’t actually tried it myself, so please do so at your own risk).

In the case of Google Tag Manager, I think it would be easier to ask the production company to output the value hashed with the SHA-256 algorithm when writing the email address etc. to the data layer variable. In my case

dataLayer.push({
  event: 'userinfo.update',
  userInfo: {
    email: '[email protected]',
    email_sha256: '946b7e2df623066a75432527fc94f5ee165fab0fae407d8cbe1fb1f0c262937a'
  }
});

I’m trying to get the email address and the hash value of the email address to be output together in a form like this.

Also, in the case of server-side GTM, there is a community template variable ” sha256 Hasher “, so I think it would be good to use this.

summary

In this article, I tried to summarize the technical term “hash” that web marketers often hear these days. Hash is one of the must-know knowledge for future marketers. I hope that you have read and understood this article.

You may also like

Leave a Comment