Hashing

Hashing is a very useful process for transforming any set of data into a series of numbers. More mathematically speaking, it maps a large set of objects to a much smaller set of keys. Once an object has been mapped to a key, you can check whether two objects are “equal” by checking equivalence of their hashes. In general, however, you can’t transform a hash back into the original object.

Hashing Functions

A good hashing function will map a set of objects to hashes evenly with as few collisions as possible. This means that on average, for a large number of input objects, we should get each of the possible hashes an equal number of times. A hashing function must also be deterministic, meaning that if you apply a hash function on a given object twice, you will get the same hash both times.

As an example, suppose you wanted a hash function that mapped all strings of characters (A-Z) to the integers $0$ through $25$. One possible hash function is as follows.

Let $g(c)$ map character $c$ to an integer so that $A$ maps to 0, $B$ maps to 1, etc.
Then our hash function can be (as a function of the string $s$):
$$hash(s) = \sum\limits_{c \in s} g(c) \pmod{26}$$

Hashes in Algorithms

For algorithms, hashing can be used to index objects like strings and trees.

Suppose you wanted to find whether a given string was contained within a set of strings. A naive search would be $O(n)$, while a binary search (assuming the set of strings is ordered) would be $O(\log n)$. However, without sacrificing much space, we can get $O(1)$ behavior for sufficiently small sets of strings by using a hash table. First, hash all the strings in the set, then create a hash table which maps a hash to a string. When we query whether a string is contained in the set, we hash it, then check the hash map to see if its contained in the values mapped to by the hash. As long as we keep collisions to a minimum, we can get $O(1)$ behavior.

Another common problem involves finding the equivalence of two strings very quickly. If we can pre-process a set of strings, we can determine the equivalence of two strings from that set in $O(1)$ time using hashing such as the Rabin-Karp hash.

Hashes in Cryptography

Hashing is also very useful in cryptography. Suppose we wanted to check whether an user has inputted the correct password. We could store a list of the passwords for each user in some database and check whether a given user’s password matches up. But if someone could gain access to the database, they’d have access to each person’s passwords. A much more secure way to store passwords is to hash them first. Then, even if someone cracks the database, they won’t be able to determine the user’s original password.

SHA-256 Hash Function (slightly modified)

I made an SHA-256 Hashing implementation for my password program. Here’s one that uses the same exact algorithm, but in JavaScript (feel free to look at the source code):

Input Text:

SHA Hash Result: