### Unit 6: Searching and Sorting

As a computer programmer, you will need to know how to search and sort data. This will require you to leverage what you have learned in a number of different Computer Science areas, drawing from your introduction to data structures and algorithms in particular. In this unit, we will identify the importance of searching and sorting, learn a number of popular searching and sorting algorithms, and determine how to analyze and appropriately apply them. By the end of this unit, you will recognize instances in which you need a searching or sorting algorithm and be able to apply one efficiently.

**Completing this unit should take you approximately 9 hours.**

Upon successful completion of this unit, you will be able to:

- demonstrate an understanding of the basic components of search algorithms and their various implementations;
- demonstrate an understanding of the differences between various search algorithms, including list search and tree search;
- differentiate various sorting algorithms; and
- demonstrate an understanding of the basic techniques involved in complexity analysis.

### 6.1: Search Algorithms

Why study searching and sorting? There are two reasons for doing so.

First, searching and sorting are tasks that occur frequently and are needed in programming.

Secondly, the requirements for these two 'needed' functions are simple and clear, their designs (that is, algorithms) are well developed, they are implemented in every language and are available in many code libraries, and their performance (time and space) are well understood.

Divide and Conquer algorithms, such as List and Tre Search, and Merge and Insertion Sort, make good examples for demonstrating the concepts of this course: decomposition, abstraction, modularization, hierarchy.

### 6.1.1: List Search

Sorting usually makes use of search. Watch these lectures on linear and binary search, and note how the use of sorting can also improve search performance.

Assume we have a collection of data objects, such as telephone numbers, and that we need to find a particular phone number in that collection. We will need a data structure for storing the objects. One such data structure is a list. A list is a generic object and can be used for any type, a type built into our programming language or a programmer defined object. For example, we can have a list of integers or a list of telephone numbers.

A list is composed of elements and has functions or methods that apply to a list, in particular, insert and remove, which add or delete elements of the list, respectively. Some languages may also have a find function. However, if our language has no such function we will need to write it. This resource discusses the implementation of a program to search a list to find a particular element in the list. Please glance at the 'list' of External Links at the bottom of the page; the elements or nodes of the list are grouped by language: Python, Java, C++ , and so on.

### 6.1.2: Tree Search

Read this page. In the previous unit of our course we studied recursive algorithms. Recursion is a concept that also applies to data. Here we look at recursive data structures - lists, trees, and sets. A list is a structure that consists of elements linked together. If an element is linked to more than one element, the structure is a tree. If each element is linked to two (sub) elements, it is called a binary tree. Trees can be implemented using lists, as shown in the resource for this unit. Several examples of the wide applicability of lists are presented. A link points to all the remaining links, i.e. the rest of the list or the rest of the tree; thus, a link points to a list or to a tree - this is data recursion.

The efficiency of the programming process includes both running time and size of data. The reading resource discusses the latter for recursive lists and trees.

Lastly, why read the last section on sets? Sets are another recursive data structure and the last section 2.7.6, indicates their connection with trees, namely, a set data type can be implemented in several different ways using a list or a tree data type. Thus, the programming process includes implementation decisions, in addition, to design or algorithm decisions. Each of these types of decisions is constrained by the features of the programming language used. The decision choices, such as which data structure to use, will impact efficiency and effectiveness of the program's satisfaction of the program's requirements.

The use of a tree structure involves traversing or stepping through the elements or nodes of the tree. This page shows how to traverse a binary tree, which we can extend to trees having more than two descendants at each node. Many problems can be modeled by a tree. For example, in chess, the first move can be represented by the root or starting node of a tree; the next move by an opponent player, by the descendent nodes of the root. This decomposition can continue for many levels. Thus, a level in the tree hierarchy represents the possible moves available to one player; and the next level, the possible moves of the opponent player. Each level represents the choices available to a given player. Traversing the tree involves: from a given start node a player looks-ahead at its descendent nodes (the possible moves), from each of these descendant nodes the player looks-ahead at their descendants (possible responding moves of the opponent player), and so on, continuing to look ahead (planning) to cover as many levels as feasible. Based on the look-ahead information (which gets better the further the look-ahead goes), the player chooses a descendant from the given start node.

Watch this lecture, which illustrates the role of searching in helping to solve many problems. Searching a tree involves traversing the tree and making a decision at each node as we traverse or step through the tree. Most problems involve making decisions. If we can put a value on the outcome of a decision and if we search for a decision that has a 'best' value, then our decision process would be a search process.

The lecture points out two common techniques for traversing all of the elements of a tree: breadth first and depth first search. A traversal technique involves deciding which descendant (sub) element to look at next. Note that the starting large decision has been decomposed into a series of decisions as to which descendent element to look at on the next level down in the tree hierarchy.

### 6.2: Sorting Algorithms

Notice that the structure of this course is a tree. The following subunits in the Unit 6 sub-tree describe sort algorithms.

Whereas searching takes a data structure as input and outputs an element of the data structure, sorting is more complex, in that it takes a data structure as input and returns a data structure of the same type, but with the elements rearranged. There are a few search algorithms: linear for a list, depth or breadth traversal for a tree, and binary search. There are several sorting algorithms; unit 6.2 presents a number of them.

### 6.2.1: Merge and Insertion Sort

You watched these videos earlier, but take some time to review the merge sort discussion at the beginning of the first lecture. Again, focus on recognizing abstraction, decomposition, and composition. Rewatch the second lecture, focusing on the bubble and selection sorts.

The following lectures step through the programming life-cycle process for sorting. The programming process comprises requirements, design, implementation of the design in a programming language, and verification/validation. The last lecture adds maintenance, extending the process to a programming life-cycle process. Please realize that the states of the process overlap and do not have to be performed sequentially.

### 6.2.2: Quick Sort

This lecture explains the details of the working of quick sort, which is on average 3 times faster than merge sort. The lecture has 3 parts: the first 20 minutes approximately, or first third, gives the explanation -- watch that part of the lecture. You should revisit the second third, or second and third parts of the lecture when you are in unit 6.2.5, later in our course.

### 6.2.3: Radix Sort

The radix sort does not compare values of the elements to be sorted; it uses the digits (in some radix, e.g. base 2) of integer keys and sorts the keys by sorting, first on the first digit (either the least significant or most significant digit), then on the next significant digit, and so on, up to the last digit. This decomposes the problem into n smaller sorting problems, namely, sorting, all the values that have the same digit in the same radix position of the key. Read this article on the Radix sort. Carefully study the discussion on efficiency and note that the complexity depends on the assumptions made regarding the primitive steps and the data structures used in a program.

### 6.2.4: Analysis

Understanding the complexity of an algorithm helps us decide whether or not we should use it as the design of a program to solve a problem. Complexity is usually measured in terms of the average number of steps in the computation of a program. The steps can be used to estimate an average bound, lower bound, and upper bound of the amount of time and for the amount of storage space needed for the computation. The lecture explains Big O notation and concept and, using recurrence relations, develops the Big O value for several types of computations. The steps of interest are the primitive steps of an algorithm and the operations that are intrinsic to the data structure used in the program implementation of the algorithm.

Big O estimates are usually associated with the algorithm -- its primitive operations and data structure. Algorithms are grouped into classes associated with linear, n log n, quadratic, and exponential Big O bounds: O(n), O(n times log n), O(n**2), and O(2**n), respectively. For a given value of n, the lecturer gives the values of n, n x log n, n**2, and 2**n, to show the dramatic growth as n grows. If a problems grows by a factor of n an algorithm that grows too fast, e.g. has quadratic or exponential growth, would not be a good choice for the design of a program to solve it.

Note that the details of complexity analysis require a strong mathematical background, and you should focus on the main ideas or 'big picture' first before delving into the details.

The video gives a concise definition of Big O, which is popular for bounding the running time for search and sort algorithms. Additionally, if you feel comfortable with your math background, you should watch the second and third parts of the video.