It is important to understand what search is and when it is appropriate. This page explains sequential and binary search, and their implementation. There is also the matter of hashing as a storage and search technique. In so doing, we introduce the unordered map and how to implement a map abstract data type using hashing. Read Sections 6.1-6.5. You may also find Sections 6.6-6.11 useful for practice.
5. Hashing
5.3. Implementing the Map Abstract Data Type
One of the most useful C++ data structures is the map. Recall that a map is an associative data type where you can store key–data pairs. The key is used to look up the associated data value.
The map abstract data type is defined as follows. The structure is an unordered collection of associations between a key and a data value. The keys in a map are all unique so that there is a one-to-one relationship between a key and a value. The operations are given below.
Table 1: Map Operations
Map Operation |
Description |
---|---|
|
Create a new, empty map. It returns an empty map collection |
|
Add a new key-value pair to the map. If the key is already in the map then replace the old value with the new value |
|
Given a key, return the value stored in the map or |
|
Delete the key-value pair from the map |
|
Return the number of key-value pairs stored in the map |
|
Return |
One of the great benefits of a map is the fact that given a key, we can look up the associated data value very quickly. In order to provide this fast look up capability, we need an implementation that supports an efficient search. We could use a list with sequential or binary search but it would be even better to use a hash table as described above since looking up an item in a hash table can approach performance.
In Listing 2 we use two lists to create a
HashTable
class that implements the Map abstract data type. One list, called slots
, will hold
the key items and a parallel list, called data
, will hold the data values. When we look up a key, the corresponding position in the data list will hold the associated data
value. We will treat the key list as a hash table using the ideas presented earlier. Note that the initial size for the hash table has been chosen to be 11. Although this is arbitrary, it is important that the size be a prime number so that the
collision resolution algorithm can be as efficient as possible.
Listing 2
class HashTable{
public:
static const int size=11;
int slots[size];
string data[size];
In Listing 3, hashfunction
implements the simple remainder method. The collision resolution technique is linear probing with a "plus 1" rehash
function. The put
function assumes that there will eventually be an empty slot unless the key is already present in the self.slots
.
It computes the original hash value and if that slot is not empty, iterates the
rehash
function until an empty slot occurs. If a nonempty slot already contains the key, the old data value is replaced with the new data value. Dealing with the situation
where there are no empty slots left is an exercise.
Listing 3
def put(self,key,data):
hashvalue = self.hashfunction(key,len(self.slots))
if self.slots[hashvalue] == None:
self.slots[hashvalue] = key
self.data[hashvalue] = data
else:
if self.slots[hashvalue] == key:
self.data[hashvalue] = data #replace
else:
nextslot = self.rehash(hashvalue,len(self.slots))
while self.slots[nextslot] != None and self.slots[nextslot] != key:
nextslot = self.rehash(nextslot,len(self.slots))
if self.slots[nextslot] == None:
self.slots[nextslot] = key
self.data[nextslot] = data
else:
self.data[nextslot] = data #replace
def hashfunction(self,key,size):
return key%size
def rehash(self,oldhash,size):
return (oldhash+1)%size
int hashfunction(int key) {
return key%size;
}
int rehash(int oldhash) {
return (oldhash+1)%size;
}
void put(int key, string val){
int hashvalue = hashfunction(key);
int count = 0;
if (data[hashvalue]=="") {
slots[hashvalue] = key;
data[hashvalue] = val;
} else {
if (slots[hashvalue] == key) {
data[hashvalue] = val;
} else {
int nextslot = rehash(hashvalue);
while (data[nextslot]!="" && slots[nextslot] != key) {
nextslot = rehash(nextslot);
count++;
if (count>size) {
cout<<"TABLE FULL"<<endl;
return;
}
}
if (data[nextslot]=="") {
slots[nextslot]=key;
data[nextslot]=val;
} else {
data[nextslot] = val;
}
}
}
}
Likewise, the get
function (see Listing 4) begins by computing the initial hash value. If the value is not in the initial slot, rehash
is used to locate the next possible position. Notice that line 15 guarantees that the search will terminate by checking to make sure that we have not returned to the initial slot. If that happens, we have exhausted all possible slots and the item
must not be present.
The final methods of the HashTable
class provide additional map functionality. We overload the __getitem__ and __setitem__ methods to allow access using``[]``. This means that
once a HashTable
has been created, the familiar index operator will be available. We leave the remaining methods as exercises.
Listing 4
def get(self,key):
startslot = self.hashfunction(key,len(self.slots))
data = None
stop = False
found = False
position = startslot
while self.slots[position] != None and not found and not stop:
if self.slots[position] == key:
found = True
data = self.data[position]
else:
position=self.rehash(position,len(self.slots))
if position == startslot:
stop = True
return data
def __getitem__(self,key):
return self.get(key)
def __setitem__(self,key,data):
self.put(key,data)
string get(int key) {
int startslot = hashfunction(key);
string val;
bool stop=false;
bool found=false;
int position = startslot;
while(data[position]!="" && !found && !stop) {
if (slots[position]==key) {
found = true;
val = data[position];
} else {
position=rehash(position);
if (position==startslot) {
stop=true;
}
}
}
return val;
}
The following session shows the HashTable
class in action. First we will create a hash table and store some items with integer keys and string data values.
int main() {
HashTable h;
h.put(54, "cat");
h.put(26, "dog");
h.put(93, "lion");
h.put(17, "tiger");
h.put(77, "bird");
h.put(31, "cow");
h.put(44, "goat");
h.put(55, "pig");
h.put(20, "chicken");
cout<<h<<endl;
return 0;
}
>> Output:
77: bird
44: goat
55: pig
20: chicken
26: dog
93: lion
17: tiger
0:
0:
31: cow
54: cat
Next we will access and modify some items in the hash table. Note that the value for the key 20 is being replaced.
...
h.put(20,"chicken");
h.put(17,"tiger");
h.put(20,"duck");
cout<<h<<endl;
...
>> Output:
77: bird
44: goat
55: pig
20: duck
26: dog
93: lion
17: tiger
65535:
0:
31: cow
54: cat
The complete hash table example can be found in ActiveCode 1.
xxxxxxxxxx
using namespace std;
class HashTable{
public:
static const int size=11; // initial size of hash table is prime to help with collision resolution
int slots[size]; // list to hold key items
string data[size]; // list to hold data values
int hashfunction(int key) { // implements remainder method
return key%size;
}
// Computes original hashvalue, and if slot is
// not empty iterates until empty slot is found
int rehash(int oldhash) {
return (oldhash+1)%size;
}
// Function that assumes there will eventually be
// an empty slot unless the key is alread present in the slot
void put(int key, string val){
int hashvalue = hashfunction(key);
Title for the C++ Window (complete_hash_cpp)
xxxxxxxxxx
class HashTable:
def __init__(self):
self.size = 11 # initial size of hash table is prime to help with collision resolution
self.slots = [None] * self.size # list to hold key items
self.data = [None] * self.size # list to hold data values
def put(self,key,data):
hashvalue = self.hashfunction(key,len(self.slots))
if self.slots[hashvalue] == None:
self.slots[hashvalue] = key
self.data[hashvalue] = data
else:
if self.slots[hashvalue] == key:
self.data[hashvalue] = data #replace
else:
nextslot = self.rehash(hashvalue,len(self.slots))
while self.slots[nextslot] != None and \
self.slots[nextslot] != key:
nextslot = self.rehash(nextslot,len(self.slots))
if self.slots[nextslot] == None:
self.slots[nextslot]=key
self.data[nextslot]=data
Complete Hash Table Example (complete_hash_py)