Consider the problem where you need to know for sure that information you are looking at has not been altered since it was first created. A simple example that kids can relate to is a report card from school. The model for report cards is that the school gives a kid his/her report card, they bring it home and their parents sign it to show that they read the report card. An obvious problem that the school has is to be certain that the report card isn’t changed while the kid has it.
In cryptography, this problem is referred to as ‘data integrity.’ Cryptographic systems solve this problem by creating a representation of the information that is unique. This representation is referred to as a ‘hash’ (or checksum). A good way to think of a hash is as a digital fingerprint that looks totally different for each set of data that it represents. Functions that are used to create hashes from information are called hash functions. There are a couple of important points about hash functions:
- The result of the hash function MUST be unique for each message that it processes. That is, you should never be able to get the same result from a hash function if you feed in two different messages.
- The result of the hash function MUST not look anything like the message that it processed.
- It MUST be impossible, given the result of a hash function, to determine what the original message was.
- The result of the hash function is usually smaller than the message that it processed.
If you wanted to exchange a message with another person and be sure that the message hasn’t been changed along the way, you could use a hash function as follows:
- Create a message.
- Feed that message into a hash function.
- Send the message to the recipient along with the hash.
- The recipient feeds the message into the same hash function used by the sender.
- The recipient compares their hash with the sender’s hash. If they are identical, the recipient is sure that the message wasn’t changed.
Of course this exchange isn’t secure as the hash could be changed in transit as well, but it illustrates how such a system works. I’ll explain how to make this exchange secure in a future post. To find out more about the basics of hash functions, read more.
So, how exactly do hash algorithms work? They work by mixing and mashing the messages being hashed. The mixing and mashing is done by dividing a message into a set of chunks. These chunks are then mixed together using math operations (this step varies greatly depending on the hash function being used).
Here’s an example of a very simple hash function. In this function, note that letters are represented using numbers (ASCII codes, so A=65, B=66, C=67 … Z=90). Note that when using ASCII characters to represent numeric values, you have to have numeric values in the range of 65 to 90 in order to be able to print them as characters.
- Divide your message into sets of 8 letters. If the number of letters in your message is not evenly divisible by 8, add as many ‘i’s to your message as needed to make it divisible by 8 (this is called ‘padding’).
- Replace the letters with numbers (ASCII values which can be found here).
- Now, arrange the sets of 8 numbers in a vertical column; ex. if your message has 24 letters, you should have 3 rows of 8 numbers arranged in a column.
- Starting at the top row, add the first number in the first 2 rows together. Since this number is going to be larger than 90, we need to do a little more math to make it fall in the range of 65 – 90. So we will take this sum and divide it by 26. This will ensure that the number is in the range of 0 – 26. Next, we’ll take this remainder value and add 65 to it. This way, we can be sure the final value will be in the range of 65 – 90. This final value then becomes the first value in a new row. Repeat this step for all of the characters in the first two rows. (For readers of my previous post on clock arithmetic, this calculation is just modular arithmetic, specifically we are calculating the value mod 26).
- The new row that you’ve created now replaces the first 2 rows. Repeat the previous step until you’ve added all the rows together.
- The final row that results from doing these operations represents the hash value for the message.
Here’s an example where I calculate the hash of the value “walter goulet is here”
The message “WALTER GOULET IS HERE” looks like this encoded in ASCII (the value 32 is a space character.)
87 65 76 84 69 82 32 71 79 85 76 69 84 32 73 83 32 72 69 82 69
Note that there are only 21 characters in this message, so we need to add 3 “I”s to the end for padding. The ASCII code for ‘I’ is 73. So, the padded message looks like this:
87 65 76 84 69 82 32 71 79 85 76 69 84 32 73 83 32 72 69 82 69 73 73 73
Next, we break the message into sets of 8 numbers:
- 87 65 76 84 69 82 32 71
- 79 85 76 69 84 32 73 83
- 32 72 69 82 69 73 73 73
Now, we add rows 1 and 2 together by adding each number in the same position together, dividing it by 26 to get the remainder, and add the value ’65′ to the result. Let’s perform this operation on rows 1 and 2:
87 + 79 = 166; 166 / 26 = 6 remainder 10; 10 + 65 = 75
65 + 85 = 150; 150 / 26 = 5 remainder 20; 20 + 65 = 85
and so on for each number in rows 1 and 2. The new row, after repeating this for each number, becomes
75 85 87 88 88 75 66 89
The new row is then added to row number 3 above using the same process.
- 75 85 87 88 88 75 66 89
- 32 72 69 82 69 73 73 73
After repeating this operation, the new row is:
68 66 65 79 66 83 74 71
After replacing these numeric values with their letter values, the final result is:
D B A O B S J G
This value is our hash value for the message “WALTER GOULET IS HERE”.
As an exercise, try changing a letter in the original message. You’ll see that when you do, the final hash output will change as well. Note that this toy hash function isn’t nearly as strong as real hash functions used in security technologies, but the basic ideas are the same
For interested readers, here’s a Ruby program implementing the hash function described above. Play around with different messages to see how the hashing works and to spot potential problems from this simplistic hash function.
#!/usr/bin/ruby -wmsgstr = String.new(ARGV[0])msgarr = Array.newarrindex = 0msgstr.upcase!# Pad the input message string if it's length is not divisible by 8if((numchar = msgstr.length.modulo(8)) != 0)padchars = 8 - numcharpadchars.times domsgstr = msgstr + "I"endend# Break the message into 8 character (or byte) wordsstrsize = msgstr.lengthwhile(strsize > 0)if(strsize.modulo(8) == 0)msgarr[arrindex] = msgstr.slice!(0..7)arrindex = arrindex + 1endstrsize = strsize - 1end# Add the rowswhile(msgarr.length > 1)row1 = msgarr.shiftrow2 = msgarr.first0.upto(7) do |x|row2[x] = ((row1[x] + row2[x]) % 26) + 65endendprint msgarr.firstprint "n"
Tags: cryptography