Hash functions are one-way functions, meaning they take some input data and generate a fixed-size output that resembles encrypted data but cannot be reversed to obtain the original input. The output is directly related to the input, but there is no way to derive the input from the output.
Hash Functions for Data Integrity
One common application of hash functions is ensuring data integrity. The sender can generate a hash of the data and transmit it along with the message. The receiver can then compute the hash on the received data and compare it with the provided hash value. If they match, the data was not tampered with; if they differ, the data has been altered.
For example, if we want to transmit the message:
“1Hundredwire has some cool content!”
Using SHA-1 as the hash algorithm, we get the hash:
c6d34e365f90275e5b7f610bdeafa94f15839
If the receiver gets both the message and its hash, they can compute the hash themselves and verify that the message has not been modified.
Now, if we change just one bit in the original message—let’s replace the last character with a period:
“1Hundredwire has some cool content.”
The new SHA-1 hash becomes:
309b2f80fbe280f779b73c5985ae3a879945edd
This new hash looks completely different from the original. This property is known as the avalanche effect, where even a small change in the input drastically changes the output. This use case is known as a Message Integrity Code (MIC).
The basic block diagram of a Hash function is simple: data is input, processed, and then output as a digest message.
data:image/s3,"s3://crabby-images/5144f/5144f2c4b1faa31af80fbb2c66f69f3196c2b3cd" alt=""
Another important application of hash functions is in Pseudorandom Functions (PRFs), which take an input and produce an output that appears random but is deterministic for the same input. PRFs are widely used in the IEEE 802.11 standard for various cryptographic operations.
One example is the derivation of cryptographic keys from passphrases using PBKDF2 (Password-Based Key Derivation Function 2). PBKDF2 internally uses a PRF (typically HMAC with a hash function like SHA-1 or SHA-256) to generate strong cryptographic keys from weak inputs like user passwords.
SHA-1
I decided to cover SHA-1 since it is used in the process of generating a key from a passphrase. It is also involved in creating encryption and decryption keys for WPA-2.
However, SHA-1 is no longer a recommended hash function. WPA-3 already uses stronger hash functions, which I will cover in another post.
SHA-1 processes data as a hash function, taking an input and producing a digest. The data is processed in 512-bit blocks.
The output is called the digest and has 160-bits (20-byte).
Padding
The data receives an extra bit and has its length encoded in 64 bits.
Let’s say the input data in ASCII is 1Hundredwire.
When converted to UTF-8, its hexadecimal representation is 0x3148756e6472656477697265.
This input has a length of 96 bits (24 hex characters).
The binary representation of the input is:
00110001 01001000 01110101 01101110 01100100 01110010 01100101 01100100 01110111 01101001 01110010 01100101
Adding an extra bit:
00110001 01001000 01110101 01101110 01100100 01110010 01100101 01100100 01110111 01101001 01110010 01100101 1
Now, we need to encode the length (96) in hex using 16 bytes:
0000000000000060
Its binary equivalent is:
00000000 00000000 00000000 00000000 00000000 00000000 00000000 01100000
SHA-1 processes data in 512-bit blocks.
- The original message is 96 bits.
- Adding an extra bit brings it to 97 bits.
- Encoding the length adds 64 bits, making the total 161 bits.
- To reach 512 bits, we pad the remaining 351 bits with zeros.
The final block, structured in 16-byte groups, is:
00110001010010000111010101101110 01100100011100100110010101100100 01110111011010010111001001100101 10000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000001100000
Input Message Extra bit Encoded Length
In hex we have:
148756e647265647769726580000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000060
What happens if the input message is already 512 bits long? In this case, an additional 512-bit block will be created to accommodate the necessary padding and length encoding.
What if the message is 515 bits long? The same process applies: the extra 1 bit will be added to the first block, and the final 64 bits will store the encoded length. The remaining space in between will be padded with zeros.
The core of the SHA-1 algorithm is based on a loop of 80 cycles per 512-bit block.
During each round, the algorithm updates the values of five buffers: A, B, C, D, and E. Each block starts with these buffers initialized to constants defined in the specification: H0, H1, H2, H3, and H4.
H0 = 0x67452301
H1 = 0xEFCDAB89
H2 = 0x98BADCFE
H3 = 0x10325476
H4 = 0xC3D2E1F0
Each block is split into 16 words (each word is 32 bits). The rounds will use the values of these words, labeled W0 through W15.
During the process, additional words will be created, from W16 to W70. These extra words are generated through specific operations applied to the initial 16 words.
for t in range(0,80):
TEMP = S(5,A) + f(t,B,C,D) + E + int(W[t],16) + K(t) & 0xFFFFFFFF
E = D
D = C
C = S(30,B)
B = A
A = TEMP
Main Loop
The code above demonstrates what happens in each cycle. We can observe that TEMP depends on Wt, meaning that one word per cycle is needed.
S Function
The S(number_of_shifts, word) function performs left-shift operations on the given word. For example, if the input is:'11000000000000000000000000000001'
the output will be:'10000000000000000000000000000011'
In this case, the first bit is wrapped around, and every other bit moves one position to the left.
F Function
The F(t, b, c, d) function receives the round number t and three words (b, c, and d). Depending on the round number t, it performs different operations.
def f(t, b, c, d):
if (0 <= t <= 19):
return (b & c) | ((~b) & d)
if (20 <= t <= 39):
return b ^ c ^ d
if (40 <= t <= 59):
return (b & c) ^ (b & d) | (c & d)
if (60 <= t <= 79):
return b ^ c ^ d
K Function
The K(t) function simply returns a constant value depending on the round number t.
def K(t):
if (0 <= t <= 19):
return int('5A827999',16)
if (20 <= t <= 39):
return int('6ED9EBA1',16)
if (40 <= t <= 59):
return int('8F1BBCDC',16)
if (60 <= t <= 79):
return int('CA62C1D6',16)
After processing each block, the resulting values are added to the original buffer.
H0 = (H0 + A) & 0xFFFFFFFF
H1 = (H1 + B) & 0xFFFFFFFF
H2 = (H2 + C) & 0xFFFFFFFF
H3 = (H3 + D) & 0xFFFFFFFF
H4 = (H4 + E) & 0xFFFFFFFF
The final output is the concatenation of the values from each H0:
digest = H0 | H1 | H2 | H3 | H4
Creating New Words
The 520 bits provide us with 16 words: W0 to W15.
To expand this, we create new words by performing an XOR operation on specific previous words and applying a left-shift to the result, as shown in the code below:
W = split_string_into_chunks(block)
for t in range(16,80):
w = S(1, (int(W[t-3],16) ^ int(W[t-8],16) ^ int(W[t-14],16) ^ int(W[t-16],16)))
Example
Let’s calculate the first round together so you can get a sense of how it works.
Our previous 512 bits are split into 32-bit blocks, so the first word is:
W0 = 00110001010010000111010101101110 (0x3148756e)
First, we need to calculate TEMP, which will later be assigned to A.
TEMP = S(5,A) + f(t,B,C,D) + E + int(W[t],16) + K(t) & 0xFFFFFFFF
On the first round A, B, C, D and E are the initial values.
S(5, 67452301)
In Hexadecimal:
0xE8A4602C
So, S(5, 67452301) = 0xE8A4602C
Binary: 01100111 01000101 00100011 00000001
After shifting 5 bits to the left:
11101000 10100100 01100000 00101100
f(t,B,C,D)
Since t is 0 we will peform: (b & c) | ((~b) & d)
B = b = 0xEFCDAB89
C = c = 0x98BADCFE
D = d = 0x10325476
data:image/s3,"s3://crabby-images/26396/263969f99e2cdeb755248c4617ebffb903ebbde7" alt=""
data:image/s3,"s3://crabby-images/e0bba/e0bbaa9d3309307ddbfd3fc0822898bf9ea95cf1" alt=""
data:image/s3,"s3://crabby-images/cf046/cf04616df797c386967a34184f7ff3ae1b03256c" alt=""
Which results in: 0x98badcfe
K[t]
This function will output 0x5A827999
Now we have everything necessary to calculate TEMP.
TEMP = S(5,A) + f(t,B,C,D) + E + int(W[t],16) + K(t) & 0xFFFFFFFF
TEMP = 0xe8a4602c + 0x98badcfe + 0xc3d2e1f0 + 0x3148756e + 0x5A827999 & 0xFFFFFFFF
TEMP = 0xd0fd0e21
By checking the main loop code, we will have the following assignments:
E = D
D = C
C = S(30,B)
B = A
A = TEMP
It is only left to calculate S(30,B).
S(30,B)
B in Binary is: 11101111110011011010101110001001
Left-shifting the first 30 bits we have: 01111011111100110110101011100010
In hex it is: 0x7bf36ae2
So on the first round we have:
A = 0xd0fd0e21, B = 0x67452301, C= 0x7bf36ae2, D=0x98badcfe, E=0x10325476
This process is repeated until the loop from 0 to 79 finishes, now using the updated values of A, B, C, D, and E.
At the end of block processing we update the buffers:
H0 = (H0 + A) & 0xFFFFFFFF
H1 = (H1 + B) & 0xFFFFFFFF
H2 = (H2 + C) & 0xFFFFFFFF
H3 = (H3 + D) & 0xFFFFFFFF
H4 = (H4 + E) & 0xFFFFFFFF
If there are no more blocks to process, the final result is the digest: H0 | H1 | H2 | H3 | H4.
In our case, with the input “1Hundredwire”, the output is: 56027592bcf72da97f115276251e9c30a428e3da.
Final thoughts
Hash is usually thought of as an integrity check, but it has plenty of use cases, from hashmaps to pseudorandom functions. In the context of 802.11i, we will work a lot with hash functions during the key derivation process to generate keys for secure communication. But first, in another post, we will dive into HMAC, which adds authentication to the hash.
I hope you find it helpful.
Stay tuned,
Diego Capassi
References
National Institute of Standards and Technology. (1995). FIPS PUB 180-1: Secure hash standard (SHS). U.S. Department of Commerce. Retrieved from https://nvlpubs.nist.gov/nistpubs/Legacy/FIPS/fipspub180-1.pdf
"HMAC: Keyed-Hashing for Message Authentication," RFC 3174 (https://datatracker.ietf.org/doc/html/rfc3174)