Searching and Hashing

It is important to understand what search is and when it is appropriate. This page explains sequential and binary search, and their implementation. There is also the matter of hashing as a storage and search technique. In so doing, we introduce the unordered map and how to implement a map abstract data type using hashing. Read Sections 6.1-6.5. You may also find Sections 6.6-6.11 useful for practice.

5. Hashing

5.3. Implementing the Map Abstract Data Type

One of the most useful C++ data structures is the map. Recall that a map is an associative data type where you can store key–data pairs. The key is used to look up the associated data value.

The map abstract data type is defined as follows. The structure is an unordered collection of associations between a key and a data value. The keys in a map are all unique so that there is a one-to-one relationship between a key and a value. The operations are given below.

Table 1: Map Operations

Map Operation

Description

m.Map()

Create a new, empty map. It returns an empty map collection

m.put(key,val)

Add a new key-value pair to the map. If the key is already in the map then replace the old value with the new value

m.get(key)

Given a key, return the value stored in the map or None otherwise

del m[key]

Delete the key-value pair from the map

len(m)

Return the number of key-value pairs stored in the map

in

Return True for a statement of the form key in map, if the given is in the map and False otherwise

One of the great benefits of a map is the fact that given a key, we can look up the associated data value very quickly. In order to provide this fast look up capability, we need an implementation that supports an efficient search. We could use a list with sequential or binary search but it would be even better to use a hash table as described above since looking up an item in a hash table can approach O(1) performance.

In Listing 2 we use two lists to create a HashTable class that implements the Map abstract data type. One list, called slots, will hold the key items and a parallel list, called data, will hold the data values. When we look up a key, the corresponding position in the data list will hold the associated data value. We will treat the key list as a hash table using the ideas presented earlier. Note that the initial size for the hash table has been chosen to be 11. Although this is arbitrary, it is important that the size be a prime number so that the collision resolution algorithm can be as efficient as possible.

Listing 2

class HashTable{
    public:
    static const int size=11;
    int slots[size];
    string data[size];

In Listing 3, hashfunction implements the simple remainder method. The collision resolution technique is linear probing with a "plus 1" rehash function. The put function assumes that there will eventually be an empty slot unless the key is already present in the self.slots. It computes the original hash value and if that slot is not empty, iterates the rehash function until an empty slot occurs. If a nonempty slot already contains the key, the old data value is replaced with the new data value. Dealing with the situation where there are no empty slots left is an exercise.

Listing 3

def put(self,key,data):
    hashvalue = self.hashfunction(key,len(self.slots))

    if self.slots[hashvalue] == None:
        self.slots[hashvalue] = key
        self.data[hashvalue] = data
    else:
        if self.slots[hashvalue] == key:
            self.data[hashvalue] = data  #replace
        else:
            nextslot = self.rehash(hashvalue,len(self.slots))
            while self.slots[nextslot] != None and self.slots[nextslot] != key:
                nextslot = self.rehash(nextslot,len(self.slots))

        if self.slots[nextslot] == None:
            self.slots[nextslot] = key
            self.data[nextslot] = data
        else:
            self.data[nextslot] = data #replace

def hashfunction(self,key,size):
    return key%size

def rehash(self,oldhash,size):
    return (oldhash+1)%size
int hashfunction(int key) {
    return key%size;
}

int rehash(int oldhash) {
    return (oldhash+1)%size;
}

void put(int key, string val){
    int hashvalue = hashfunction(key);
    int count = 0;

    if (data[hashvalue]=="") {
        slots[hashvalue] = key;
        data[hashvalue] = val;
    } else {
        if (slots[hashvalue] == key) {
            data[hashvalue] = val;
        } else {
            int nextslot = rehash(hashvalue);

            while (data[nextslot]!="" && slots[nextslot] != key) {
                nextslot = rehash(nextslot);

                count++;
                if (count>size) {
                    cout<<"TABLE FULL"<<endl;
                    return;
                }
            }
            if (data[nextslot]=="") {
                slots[nextslot]=key;
                data[nextslot]=val;
            } else {
                data[nextslot] = val;
            }
        }
    }
}

Likewise, the get function (see Listing 4) begins by computing the initial hash value. If the value is not in the initial slot, rehash is used to locate the next possible position. Notice that line 15 guarantees that the search will terminate by checking to make sure that we have not returned to the initial slot. If that happens, we have exhausted all possible slots and the item must not be present.

The final methods of the HashTable class provide additional map functionality. We overload the __getitem__ and __setitem__ methods to allow access using``[]``. This means that once a HashTable has been created, the familiar index operator will be available. We leave the remaining methods as exercises.

Listing 4

def get(self,key):
    startslot = self.hashfunction(key,len(self.slots))

    data = None
    stop = False
    found = False
    position = startslot
    while self.slots[position] != None and not found and not stop:
        if self.slots[position] == key:
            found = True
            data = self.data[position]
        else:
            position=self.rehash(position,len(self.slots))
            if position == startslot:
                stop = True
    return data

def __getitem__(self,key):
    return self.get(key)

def __setitem__(self,key,data):
    self.put(key,data)
string get(int key) {
    int startslot = hashfunction(key);

    string val;
    bool stop=false;
    bool found=false;
    int position = startslot;
    while(data[position]!="" && !found && !stop) {
        if (slots[position]==key) {
            found = true;
            val = data[position];
        } else {
            position=rehash(position);
            if (position==startslot) {
                stop=true;
            }
        }

    }
    return val;
}

The following session shows the HashTable class in action. First we will create a hash table and store some items with integer keys and string data values.

int main() {
    HashTable h;

    h.put(54, "cat");
    h.put(26, "dog");
    h.put(93, "lion");
    h.put(17, "tiger");
    h.put(77, "bird");
    h.put(31, "cow");
    h.put(44, "goat");
    h.put(55, "pig");
    h.put(20, "chicken");
    cout<<h<<endl;

    return 0;
}

>> Output:
  77: bird
  44: goat
  55: pig
  20: chicken
  26: dog
  93: lion
  17: tiger
  0:
  0:
  31: cow
  54: cat

Next we will access and modify some items in the hash table. Note that the value for the key 20 is being replaced.

...
h.put(20,"chicken");
h.put(17,"tiger");
h.put(20,"duck");
cout<<h<<endl;
...

>> Output:
77: bird
44: goat
55: pig
20: duck
26: dog
93: lion
17: tiger
65535:
0:
31: cow
54: cat

The complete hash table example can be found in ActiveCode 1.

Title for the C++ Window (complete_hash_cpp)

Complete Hash Table Example (complete_hash_py)