Monday, November 23, 2009

Hashing

What is hashing?
Hashing can be thought of as a mapping of a set of values(S) to a set of discrete points(k1-kn) on a line (one dimension).

Generally the range of S is much larger than n, but count(S) is lesser than n. As a result most hash functions will try to distribute elements of S uniformly among the available keys. But there are bound to be collisions as in most situations perfect hashing is not possible.

Perfect hashing is possible when you know S in advance (a rare scenario). In this case you can simply have k1=1, k2=2 and so on.

Hashing in 2-D, 3-D... n-D
With hashing in one dimension collisions are inevitable. But if you think in 2 dimensions, then you are trying to map S to discrete values in a rectangle ( 32 bit squares in practical implementations). What this really means is that you generate 2 keys for the same element Si. The chances of collision decrease significantly in this case. For instance two 32 bits hashcodes used together are going to be much better than a single 64 bit hashcode.

In a similar way 3 hashcodes can be thought of as mapping S to a cuboid (32 bit cubes), and so on till n dimensions.

No comments:

Post a Comment