Searching and Hashing

It is important to understand what search is and when it is appropriate. This page explains sequential and binary search, and their implementation. There is also the matter of hashing as a storage and search technique. In so doing, we introduce the unordered map and how to implement a map abstract data type using hashing. Read Sections 6.1-6.5. You may also find Sections 6.6-6.11 useful for practice.

4. The Binary Search

4.1. Analysis of Binary Search

To analyze the binary search algorithm, we need to recall that each comparison eliminates about half of the remaining items from consideration. What is the maximum number of comparisons this algorithm will require to check the entire list? If we start with n items, about items will be left after the first comparison. After the second comparison, there will be about n/4. Then n/8, n/16, and so on. How many times can we split the list? Table 3 helps us to see the answer.


When we split the list enough times, we end up with a list that has just one item. Either that is the item we are looking for or it is not. Either way, we are done. The number of comparisons necessary to get to this point is i where =1. Solving for i gives us  . The maximum number of comparisons is logarithmic with respect to the number of items in the list. Therefore, the binary search is .

One additional analysis issue needs to be addressed. In the recursive solution shown above, the recursive call,

binarySearch(alist[:midpoint],item)

uses the slice operator to create the left half of the list that is then passed to the next invocation (similarly for the right half as well). The analysis that we did above assumed that the slice operator takes constant time. This means that the binary search using slice will not perform in strict logarithmic time. Luckily this can be remedied by passing the list along with the starting and ending indices. The indices can be calculated as we did in Listing 3. This is especially relevant in C++, where we are initializing a new vector for each split of our list. To truly optimize this algorithm, we could use an array and manually keep track of start and end indices of our array. Below is an example of such an implementation.

#include <iostream>
using namespace std;

//Checks to see if item is in a vector
//retruns true or false (1 or 0)
//using binary Search and
//uses start and end indices
bool binarySearch(int arr[], int item, int start, int end) {
if (end >= start) {
int mid = start + (end - start) / 2;
if (arr[mid] == item)
return true;
if (arr[mid] > item)
return binarySearch(arr, item, start, mid - 1);
else {
return binarySearch(arr, item, mid + 1, end);
}
}

return false;
}

bool binarySearchHelper(int arr[], int size, int item) {
return binarySearch(arr, item, 0, size);
}

int main(void) {
int arr[] = {0, 1, 2, 8, 13, 17, 19, 32, 42};
int arrLength = sizeof(arr) / sizeof(arr[0]);

cout << binarySearchHelper(arr, arrLength, 3) << endl;
cout << binarySearchHelper(arr, arrLength, 13) << endl;

return 0;
}

Optimized Binary Search (binary_search_cpp_array)

Even though a binary search is generally better than a sequential search, it is important to note that for small values of n, the additional cost of sorting is probably not worth it. In fact, we should always consider whether it is cost effective to take on the extra work of sorting to gain searching benefits. If we can sort once and then search many times, the cost of the sort is not so significant. However, for large lists, sorting even once can be so expensive that simply performing a sequential search from the start may be the best choice.