Introduction to Hashing - BunksAllowed

BunksAllowed is an effort to facilitate Self Learning process through the provision of quality tutorials.

Random Posts

Introduction to Hashing

Share This

We have learned different search techniques where search time is basically dependent on the number of elements. In the case of an unsorted sequential array, the complexity of searching is O(n), whereas in a sorted array it is O(log n).

Hence, if there exists any technique by which searching time can be reduced, that will help us to work with a huge volume of data. The hashing technique helps us to reduce this access time.

In this tutorial, we will discuss what is hashing and how it works? Let us start with some terminologies used in hashing.


Hash Table


A Hash Table is a data structure that is used to store data in an associative manner. It uses an array as a storage and hash technique to generate an index of where an element is to be stored or where they are to be accessed.

The primary idea behind a hash table is to establish a mapping between the set of all possible keys and positions in the array using a hash function.


Hash Function


A hash function accepts a key and returns its hash coding, hash value. Keys vary in type, but hash coding is always integers as they represent indexes.

Since both computing a hash value and indexing into an array can be performed in constant time. The beauty of hashing is that we can use it to perform constant time searches.

When a hash function can guarantee that no two keys will generate the same hash coding, the resulting hash table is said to be directly addressed. But in practice, it is rarely possible. Typically, the number of entries in a hash table is small relative to the universe of possible keys. Consequently, most hash functions map some keys to the same position in the table. When two keys map to the same position, they collide. A good hash function minimizes the collision.


Different Hash Functions


A function is applied to the key to produce an integer that can be used as an index in a hash table. The intent is that elements will be relatively randomly and uniformly distributed.


The perfect Hash function is a function applied to all the members of the set of items to be stored in a hash table, producing a unique set of integers within some suitable range. Such a function does not create any collision. There is no magic formula for the creation of the hash function. It can be any mathematical transformation that produces a relatively random and unique distribution of values within the address space of the storage. However, Finding a perfect hash function that works for a given data can be extremely time-consuming and very often just impossible. Therefore, we must deal with collisions and learn to handle them. So we will learn the creation of some hash functions now.

Division Method: In this method, we map a key k into one of the m slots by taking the remainder of k divided by m. So the hash function is

h(k)=k mod m+1

Multiplication Method: The multiplication method for creating hash functions operates in two steps: First, we multiply the key k by a constant A in the range 0 < A < 1 and extract the fractional part of kA. Then we multiply this value by m and take the floor of the result. The hash function is:

h(k)=⌊m(k mod 1)⌋

Midsquare Method: In the mid-square method, we square the key after getting the number we take some digits from the middle of that number as an address. Let's take some 4-digit numbers as a key:

1337 1273 1391 1026

Now square these keys:

1787569 1620529 1934881 1052676

Now take the 3rd and 4th digits from each number and that will be the hash address of these keys. Let us assume here table size is 1000. So the hash address for these keys will be 75, 05, 48, 26.

Folding Method: One easiest way to compute the key is to break the key into pieces, add them, and get the hash address. Let us take some digits keys:

82394561 87139465 83567271 85943228

Now chop them into pieces of 3, 2, and 3 digits and add them. So the address will be

82394561 = 823 + 94 + 561 = 1478
87139465 = 871 + 39 + 465 = 1375
83567271 = 835 + 67 + 271 = 1173
85943228 = 859 + 43 + 228 = 1130

Now truncate them up to the digit based on the size of the hash table. Let the size of the hash table is 1000. So we will truncate here the higher digit of the number. Now the hash address of the keys will be as:

H(82394561)= 478, H(87139465)= 375, H(83567271)= 173, H(85943228)= 130

Collision Resolution Techniques


Suppose we want to add a new record R with key k to our file F, but suppose the memory location address H(k) is already occupied. This situation is called a collision. That is a collision occurs when more than one keys map to the same hash value in the hash table. The following ways to resolve the collisions:

    1. Collision resolution by open addressing
    2. Collision resolution by separate chaining

The performance of these methods depends on the load factor (λ = n/m). This is the ratio of the number n of keys in k to the number m of hash addresses.

The efficiency of a hash function with a collision resolution procedure is measured by the average number of probes(key comparisons)needed to find the location of the record with a given key k. The efficiency depends mainly on the load factor λ and the following conditions of search.

Happy Exploring!

No comments:

Post a Comment