### Unit 7: Searching and Sorting Algorithms

In this unit, students will learn to apply searching and sorting algorithms to arrays. Students will also learn how to conduct worst-case and best-case analysis for these algorithms, determine their efficiency, and assign them generalized Big O notations.

**Completing this unit should take approximately 7 hours.**

Upon successful completion of this unit, you will be able to:

- understand common search methods.
- define linear, binary, and fibonacci search.
- illustrate how linear, binary, and fibonacci search would be implemented using arrays.
- illustrate how linear, binary, and fibonacci search would be implemented using linked lists.
- compare linear, binary, and fibonacci search in terms of their memory and runtime efficiency.

- understand common sort methods.
- describe insertion, quick, and merge sort.
- compare insertion, quick, and merge sorting in terms of their overall memory and runtime efficiency.
- chart the efficiency of insertion, quick, and merge sorting for small, medium, and large data loads.

- plan efficient and effective search and sort algorithms.
- apply Big-O analysis to discern the efficiency of searching and sorting algorithms relative to data volumes and categories.
- state the worst, average, and best time complexity for common search and sort algorithms.

- judge searching and sorting algorithms relative to their application efficiency.
- determine the efficiency of common sorting and searching algorithms in terms of their computer-resource utilization.
- illustrate means of empirically demonstrating that your estimate of resource utilization is correct for a given application.

- rate search and sort algorithms relative to an application's needs.
- recommend and justify the use of a specific search or sort algorithm according to expected data sizes.

- understand common search methods.

### 7.1: Peak Finding via Vector Search

This lecture introduces search algorithms. It also reviews what we have learned about time-complexity. These lectures use Python for illustration, but we will present code in C++. The point of the lectures is to understand the background of the various approaches; don't focus on Python specifically. This contrasts with other teaching approaches that start with the syntax of a particular language.

We begin with this particular lecture, since it introduces concepts that are important to understanding search and sort, graphs and hashing, and the scalability of various approaches. Though we will look at some other lectures in this series, the entire lecture series covers more than this course's purview, so we will not directly engage with all of them. However, you may wish to check out the course page for the original course at MIT, if you'd like more information on what is covered here.

"Complexity" refers to ALL resources consumed by an algorithm, although most people speak only of runtime (processing time, time to run successfully to completion). Refer to our earlier discussion on time/memory tradeoffs. You must consider the overall cost of running an algorithm. For instance, memory is expensive in cloud computing, so you have to be careful about saving time by using algorithms that are memory-intensive. You also have to consider other resources such as network utilization as data is transferred from one place to another. As you work, continue to think in this manner.

Searching, sorting, graphs, and hashing, using arrays or other data structures, are typically discussed in the context of recursion. While these are "academically elegant" and arguably easier to explain, recursion is memory inefficient. Generally, it assumes that large amounts of memory are available. However, that is rarely the case in industrial applications, especially with embedded systems. Additionally, the memory cost for cloud-based systems is high. Constant expansion of memory use with recursive approaches can eat up a lot of money very quickly. Thus, recursion will not be emphasized here.

### 7.2: Models of Computation and Document Distance

Here we emphasize the fact that a computer program is not an algorithm. Rather, a computer program is just one way of expressing an algorithm. Computer languages are just a means of expressing in syntax what we want the computer to do. Understanding this important nuance is essential as you delve into the algorithms we explore in this unit. For instance, sorting is not a matter of computer programming but a matter of algorithm development. It is true that ultimately, the computer has to be made to carry out the task at hand. However, we must start with an algorithm (what we want the computer to do), not with a computer program (how we make a computer do something). Otherwise, the computer becomes a limitation instead of an aid. This lecture also discusses document distance, which will be important for several examples in later lectures.

### 7.3: Why Sort? Insertion Sort and Merge Sort

Why bother with sorting data? This lecture discusses that question and then goes into two different sorting methods. You will notice that many search algorithms assume sorted data, especially with the large data volumes that are common today.

Watch this lecture to learn about the divide-and-conquer approach to sorting and its application to merge sorts. This approach is especially worthwhile when multiple computing cores are available, either with or without shared memory. Each divided component can be exercised independent of the other components. The merge step brings it all together. Divide-and-conquer applies to more than just sorting methods, but we focus on that in this lecture.

### 7.3.1: Big-O Analysis of Merge and Insertion Sort

Read this efficiency analysis of insertion sort. The pseudocode is easily reformatted for C/C++. Efficiency analysis is important when selecting solution methods appropriate for the application and situation.

This page presents an efficiency analysis of merge sort. It will help you choose the sorting algorithm appropriate to your project's requirements and situation. For instance, if data arrives over time, insertion sort using linked lists may well be the best choice. You certainly do not want to re-sort the entire collected data just to add one element. Most demonstrations of sorting methods assume that a large block of data already exists and that the task is to sort that data block. The case of data arriving over time is rarely considered but that is often the case in modern industrial and commercial applications. The "Internet of Things" (IoT) is a prime example, as is ongoing retail sales.

### 7.3.2: Merge and Insertion Sort in C/C++

Since the lectures in this unit use Python, review these examples of Insertion and Merge sorts in C++.

### 7.4: Linear Search

Linear search is the most basic of search methods. Watch these videos to see how this simple search method works and how it is programmed in C/C++. You may wish to view the original link to find other useful material.

### 7.5: Fibonacci Search

Fibonacci search expands on linear search in that the steps are greater than one. The comments in this C++ implementation explains.

### 7.6: Binary Search, and Bubble and Selection Sorts

Watch this lecture about binary search, bubble sort, and selection sort methods. You need to be aware of various approaches to solving sorting requirements.

Since the lectures in this unit use Python for illustration, review these examples in C++.

### 7.7: Quicksort

Watch this lecture until 18:44 for an explanation of Quicksort. If you watch through the end, you will be led through a deep analysis that derives Big-O for this algorithm.

This recursive implementation of quicksort in C++ closely mirrors the one discussed in the lecture, which used Python.

This is a non-recursive version of Quicksort for C/C++. The variable "recursive" is used to show where recursion would normally take place. We present this since the lectures in this unit use Python for illustration.

### Unit 7 Assessment

Take this assessment to see how well you understood this unit.

- This assessment
**does not count towards your grade**. It is just for practice! - You will see the correct answers when you submit your answers. Use this to help you study for the final exam!
- You can take this assessment as many times as you want, whenever you want.

- This assessment