As a computer programmer, you need to know how to search and sort data. This will require you to leverage what you have learned in a number of different computer science areas, drawing from your earlier study of data structures and algorithms. In this
unit, we will identify the importance of searching and sorting, learn a number of popular searching and sorting algorithms, and determine how to analyze and appropriately apply them. By the end of this unit, you will recognize instances in which you
need a searching or sorting algorithm and be able to apply one efficiently.
Completing this unit should take you approximately 8 hours.
Why study searching and sorting? There are two reasons for doing so.
First, searching and sorting are tasks that occur frequently and so are needed in programming.
Secondly, the requirements for these two necessary capabilities are simple and clear. Their designs and implementations are well developed, they are implemented in every language, are available in many code libraries, and their performance (time and space) are well understood.
Divide and Conquer algorithms, such as List and Tree Search, and Merge and Insertion Sort, make good examples for demonstrating the concepts of this course: decomposition, abstraction, modularization, hierarchy.
Sorting usually makes use of search. Watch these lectures on linear and binary search, and note how the use of sorting can also improve search performance, in some cases. These lectures make mention of Python code but that part can be ignored since the lectures stand alone. They explain the basics of the algorithm well and in such a way that the brief exposure of code is only ancillary. Another point: A lot of time is spent discussing list representation. Do not ignore that part of the discussion. How a list is represented makes a huge difference in search and sort performance.
Assume we have a collection of data objects, such as telephone numbers, and that we need to find a particular phone number in that collection. We will need a data structure for storing the objects. One such data structure is a list. A list is a generic object and can be used for any type, a type built into our programming language or a programmer defined object. For example, we can have a list of integers or a list of telephone numbers.
A list is composed of elements and has functions or methods that apply to a list, in particular, insert and remove, which add or delete elements of the list, respectively. Some languages may also have a find function. However, if our language has no such function we will need to write it. This resource discusses the implementation of a program to search a list to find a particular element in the list. Please glance at the 'list' of External Links at the bottom of the page; the elements or nodes of the list are grouped by language: Python, Java, C++, and so on. C++ is used in this article.
Read this page. In the previous unit of our course we studied recursive algorithms. Recursion is a concept that also applies to data. Here we look at recursive data structures - lists, trees, and sets. A list is a structure that consists of elements linked together. If an element is linked to more than one element, the structure is a tree. If each element is linked to two (sub) elements, it is called a binary tree. Trees can be implemented using lists, as shown in the resource for this unit. Several examples of the wide applicability of lists are presented. A link points to all the remaining links, i.e. the rest of the list or the rest of the tree; thus, a link points to a list or to a tree - this is data recursion.
The efficiency of the programming process includes both running time and size of data. This page discusses the latter for recursive lists and trees.
Lastly, why read the last section on sets? Sets are another recursive data structure and the last section 2.7.6, indicates their connection with trees, namely, a set data type can be implemented in several different ways using a list or a tree data type. Thus, the programming process includes implementation decisions, in addition, to design or algorithm decisions. Each of these types of decisions is constrained by the features of the programming language used. The decision choices, such as which data structure to use, will impact efficiency and effectiveness of the program's satisfaction of the program's requirements.
Note: You will notice an unusual use of C++ here. What the author is doing is showing how to pass a fixed-value data-structure as a calling argument.
The use of a tree structure involves traversing or stepping through the elements or nodes of the tree. This page shows how to traverse a binary tree, which we can extend to trees having more than two descendants at each node. Many problems can be modeled by a tree. For example, in chess, the first move can be represented by the root or starting node of a tree; the next move by an opponent player, by the descendent nodes of the root. This decomposition can continue for many levels. Thus, a level in the tree hierarchy represents the possible moves available to one player; and the next level, the possible moves of the opponent player. Each level represents the choices available to a given player. Traversing the tree involves: from a given start node a player looks-ahead at its descendent nodes (the possible moves), from each of these descendant nodes the player looks-ahead at their descendants (possible responding moves of the opponent player), and so on, continuing to look ahead (planning) to cover as many levels as feasible. Based on the look-ahead information (which gets better the further the look-ahead goes), the player chooses a descendant from the given start node.
It is important to understand what search is and when it is appropriate. This page explains sequential and binary search, and their implementation. There is also the matter of hashing as a storage and search technique. In so doing, we introduce the unordered map and how to implement a map abstract data type using hashing. Read Sections 6.1-6.5. You may also find Sections 6.6-6.11 useful for practice.
Whereas searching takes a data structure as input and outputs an element
of the data structure, sorting is more complex in that it takes a data
structure as input and returns a data structure of the same type, but
with the elements rearranged. There are a few search algorithms: linear
for a list, depth or breadth traversal for a tree, and binary search.
There are several sorting algorithms; this unit presents a number of
them.
This page explains and implements selection sort, bubble sort, merge sort, quick sort, insertion sort, and shell sort.
This lecture explains the details of the working of quick sort, which is on average 3 times faster than merge sort. The coding syntax is very general, fit for any language. The video has 3 parts: the first 20 minutes approximately, or first third, gives the explanation – watch that part of the lecture. You should watch the rest of the lecture when you study Big-O analysis.
The radix sort does not compare values of the elements to be sorted; it uses the digits (in some radix, e.g. base 2) of integer keys and sorts the keys by sorting, first on the first digit (either the least significant or most significant digit), then on the next significant digit, and so on, up to the last digit. This decomposes the problem into n smaller sorting problems, namely, sorting, all the values that have the same digit in the same radix position of the key. Read this article on the Radix sort. Carefully study the discussion on efficiency and note that the complexity depends on the assumptions made regarding the primitive steps and the data structures used in a program.
These videos on algorithm analysis are well thought out and presented. At issue is that the simple code used for illustration is written in Python, a modern industrial language but not a prerequisite for this course. This page gives code snippets that compare Python with C++. C++ syntax is sufficiently similar to Java that you will readily see the relationship.
Understanding the complexity of an algorithm helps us decide whether or not we should use it as the design of a program to solve a problem. Complexity is usually measured in terms of the average number of steps in the computation of a program. The steps can be used to estimate an average bound, lower bound, and upper bound of the amount of time and for the amount of storage space needed for the computation. The lecture explains Big O notation and concept and, using recurrence relations, develops the Big O value for several types of computations. The steps of interest are the primitive steps of an algorithm and the operations that are intrinsic to the data structure used in the program implementation of the algorithm.
In this unit we continue our exploration of abstraction with respect to algorithm complexity. This lecture discusses the classification of algorithms according to how the performance of an algorithm grows relative to the size of the problem or task the algorithm solves or performs. Algorithms are classified by average run-time complexity, defined as the average number of steps the algorithm takes for a problem of size 'n'. Abstraction ignores the implementation of the algorithm and only considers the growth in the number of (primitive) steps an algorithm takes as the size of the problem grows. The video lecture introduces big 'O' notation and gives examples for linear, log, quadratic, and exponential complexity. The lecturer states that exponential complexity should be avoided in general. Although the examples are in C++, the same principles apply to algorithms in general, regardless of language.
Take this assessment to see how well you understood this unit.