CS201 Study Guide

Site: Saylor Academy
Course: CS201: Elementary Data Structures
Book: CS201 Study Guide
Printed by: Guest user
Date: Saturday, May 18, 2024, 7:44 PM

Navigating the Study Guide

Study Guide Structure

In this study guide, the sections in each unit (1a., 1b., etc.) are the learning outcomes of that unit. 

Beneath each learning outcome are:

  • questions for you to answer independently
  • and a brief summary of the learning outcome topic with linked resources.

At the end of each unit, there is also a list of suggested vocabulary words.

 

How to Use the Study Guide

  1. Review the entire course by reading the learning outcome summaries and the linked resources.
  2. Test your understanding of the course information by answering questions related to each unit learning outcome and defining and memorizing the vocabulary words at the end of each unit.

By clicking on the gear button on the top right of the screen, you can print the study guide. Then you can make notes, highlight, and underline as you work.

Through reviewing and completing the study guide, you should gain a deeper understanding of each learning outcome in the course and be better prepared for the final exam!

Unit 1: Abstract Data Types and Arrays in C++

1a. Solve program implementation requirements using multidimensional arrays

  1. Describe the difference between contiguous and non-contiguous memory for any object.
  2. Demonstrate various means of implementing arrays.
  3. Show the difference between variable addresses and pointers.
  4. What is a linked-list, and what are its components?
  5. How does one create a linked-list?

Understanding how to arrange data in a computer's active and persistent memory is an important component of your study of computer science. The choice of arrangement depends very much on the application at hand; even so, it should never be so inflexible that it can't evolve with that application's requirements. Many "standards" in software engineering and software project management assume, if only implicitly, that a project's requirements are static. In the real world, that is never the case.

In this subunit, we focus on the most common data structure, arrays. These are native to most languages. Since we are dealing with C/C++, be sure you understand the rudiments of native array implementation in that language. Additionally, be sure you know the difference between contiguous and non-contiguous implementation. Although we do not cover it in this course, it is worth noting that "non-contiguous" can also refer to components of an array (or any data structure) that do not exist on the same physical computer that the access/manipulation software is running on.

Access to non-contiguous arrays and other data structures is facilitated by memory pointers. Memory pointers use memory addressing to locate the various components of a data structure.

Linked-lists are also part of the discussion on arrays since arrays can be implemented using nodes that point to other nodes. Each node can be interpreted as an array cell. As we have seen in the course, linked-lists are the basis for many other useful data structures. There are many approaches to the implementation of linked-lists; you should be familiar with those.

 

1b. Relate C/C++ structures and classes to ADTs

  1. What is the Standard Template Library (STL), and why is it important?
  2. Define "Abstract Data Type" (ADT).
  3. How are classes and structures and arrays related to ADTs?
  4. What are the important attributes of ADTs?

Abstract data types (ADTs) are the general case of specific types, such as arrays and linked-lists. Be sure you are familiar with the general concept of ADTs (in this video from 44:30). You should also understand the relationship between the general concept and C/C++ classes and structures.

The Standard Template Library (STL) is now part of the C++ language. Review the STL (in this video from 21:08-29:25) since it is applied regularly in professional practice.

 

1c. Apply the general concept of ADTs to lists and arrays accessed via pointers

  1. Review author implementations of linked-lists. How could they be improved?
  2. Review author implementations of arrays. How could they be improved?
  3. Why is it important to release memory allocated at runtime after it is no longer needed?
  4. Explain how memory can be available but unusable.

Use your favorite IDE to practice with arrays and linked-lists. Practice is essential. You need experience to be successful in your professional work. Programs used by various authors are presented as illustrations throughout the course. It is well worth your while to load these into your IDE and get them to work. As you do this, study them to understand their logic and how it translates to C/C++ syntax.

 

Unit 1 Vocabulary

This vocabulary list includes terms that might help you with the review items above and some terms you should be familiar with to be successful in completing the final exam for the course.

Try to think of the reason why each term is included. Understanding linked-lists, arrays, and how they relate to ADTs lays the foundation for much of what is done in the application of data structures. Here are some terms you should be familiar with:

  • Node
  • Cell
  • Array
  • Vector
  • Linked-list
  • Abstract data type
  • Contiguous
  • Non-contiguous
  • Standard Template Library
  • Pointer
  • Address

Unit 2: Introduction to Stacks and Queues

2a. Recognize and visualize stacks and queues

  1. What is a stack?
  2. What is a queue?
  3. What is the essential difference between stacks and queues?
  4. Recognize diagrams that illustrate stacks and queues.
  5. Be able to modify a stack ADT so that it functions like a queue.

We mentioned lists earlier. That discussion continues in this unit; see pages 96-99 of this book. Lists form the foundation of stacks and queues. There is only a subtle difference between stacks and queues. Both add newly-arriving elements to the same end of the list. However, the stack processes the most recently-added element. It removes the element from the end of the list to which elements are added. Queues, on the other hand, remove elements from the other end of the list from which elements are added. In this way, stacks process the element that has been on the list the least amount of time (last-in/first-out, LIFO). Queues process the element that has been on the list the longest amount of time (first-in/first-out, FIFO). Thus, code that implements either abstract data type can be easily modified to implement the other type. Stacks can be easily modified to function as queues and vice versa.

 

2b. Solve software requirements that require stacks or queues

  1. What are the various means of implementing stacks and queues?
  2. How do an application's requirements lead to the selection of stack or queue ADTs?
  3. What are the various means of implementing lists?
  4. How can a vector be implemented as a linked-list?
  5. What should one look for to tell if an ADT's code implements a stack or queue?

Stacks are most commonly implemented using linked-lists (as shown on pages 120-127 of this book), as are queues (pages 127-144). There are means of implementing stacks using arrays that can be readily modified to act as queues. Understanding which applications are suitable for stacks and which are suitable for queues will help you see the practical side of these two abstract data types.

 

Unit 2 Vocabulary

This vocabulary list includes terms that might help you with the review items above and some terms you should be familiar with to be successful in completing the final exam for the course.

Try to think of the reason why each term is included.

  • stack
  • queue
  • list
  • linked-list
  • array
  • push
  • pop
  • element
  • object
  • peek
  • buffer
  • rotate
  • head
  • tail
  • empty
  • insert
  • LIFO
  • FIFO
  • linear data structure
  • sequential access
  • top element
  • bottom element
  • circular list
  • list overflow

Unit 3: Pointers and References in C++

3a. Interpret native C/C++ variable declarations relative to data organization and access

  1. How do bytes compose words in computer hardware?
  2. What are various native data types and the amount of memory each occupies?
  3. How can we extend the accuracy of numeric data types using extended declarations?
  4. What are the means by which C/C++ programs can report the actual memory occupied by a given native variable type and by contiguous abstract data types?

Although this is not a language course, we presented a brief review. Understanding the syntax of C/C++ is required before studying memory pointers, and knowledge of pointers is essential to working with abstract data types and data structures. C/C++ has numerous native data types. These can be combined into more complex abstract data types, along with manipulation and access methods. Each native data type and each abstract data type occupies memory, which, along with runtime, is another resource that is consumed whenever a program is running.

 

3b. Choose approaches to using pointer variables for memory access within scalars, vectors, and arrays

  1. How does one declare a pointer to a native data type or abstract data type?
  2. What is necessary to be done to obtain the address of a variable?
  3. To what does a memory pointer actually refer?
  4. By what arithmetic process can the address of cells within contiguous-memory vectors and arrays and the elements of abstract data types be obtained?
  5. Do vectors, arrays, and other abstract data types necessarily occupy contiguous memory? Explain.

Be certain that you understand memory pointers (watch this video from 35:51 until the end). These are fundamental to data structures. Review how a pointer's value is incremented and decremented and how pointers can be manipulated to obtain the value of data elements. For more detail, look into pointer arithmetic. This is a practical matter that applies to any contiguous-memory object and is also useful within objects occupying non-contiguous memory. Be sure to get plenty of practice since pointers and pointer arithmetic are usually not well understood by those just beginning their study of data structures.

 

Unit 3 Vocabulary

This vocabulary list includes terms that might help you with the review items above and some terms you should be familiar with to be successful in completing the final exam for the course.

Try to think of the reason why each term is included.

  • sizeof
  • pointer
  • address
  • extended data type
  • compound data type
  • word
  • byte
  • width
  • bit
  • initialization
  • type qualifier
  • pointer incrementing
  • pointer arithmetic
  • indirection
  • dereferencing

Unit 4: Dynamic Memory Allocation

4a. Construct and deconstruct memory-based CDTs

  1. What are the various approaches to runtime memory allocations?
  2. For each approach to runtime memory allocation, what is the appropriate approach to deallocation?
  3. Why is it important to deallocate runtime-allocated memory as soon as it is no longer needed?
  4. Is memory allocated at runtime for a given ADT always contiguous?
  5. Is a single deallocation statement always sufficient to deallocate an ADT's runtime memory?
  6. Is setting a pointer's value to NULL sufficient to deallocate the memory referenced by the pointer?

We often do not know how many variables or which variable types will be needed while an application is running. C/C++ provides various native types, and we can create compound data types (CDTs) as needed. The application will give us a global view of the limits on types. For instance, if the application never operates on an "unsigned long int", then we know that type will never be needed. But, if the application operates on "long int", then we know that type is needed – but not always how many. This is how dynamic (runtime) memory allocation and deallocation comes into play. We can allocate memory to variables of different data types as needed and deallocate that memory (that is, return it for other uses) when it is no longer needed. Memory allocation is introduced in slide 13 of these notes. This page extends that discussion to ADTs in general. Keep in mind that allocating memory is not sufficient; you have to also manage it. Pay special attention to the methods, terminology, and exercises on this page.


4b. Choose appropriate means of making the best use of available memory in dynamic applications

  1. What are the two major issues that can arise when a program employs dynamic memory?
  2. How is it possible for a program to run out of memory even though the OS reports plenty of memory?
  3. How can a program fail a dynamic allocation when there is more than enough free memory available?
  4. How can the memory of one data type be converted into the memory of another data type?
  5. Why is it not a good idea to interleave allocations of large and small blocks of memory?
  6. What is a good way of calculating the amount of memory to allocate? Why do we do that?

Memory fragmentation is an interesting problem, and is important to consider when employing dynamic memory. There are ways to deal with this issue that are described in this article. Some even go so far as to write the own allocation routines. All of this is part of managing dynamic memory, which is discussed as part of learning outcome 4a. Interestingly, it is possible to reuse memory without deallocation or reallocation. Variables in C/C++ can be recast as long as both variable types employ exactly the same amount of memory. This is an important approach and is not unsafe as long as you are careful.


Unit 4 Vocabulary

This vocabulary list includes terms that might help you with the review items above and some terms you should be familiar with to be successful in completing the final exam for the course.

Try to think of the reason why each term is included.

  • "unsafe" variable
  • casting
  • allocation
  • deallocation
  • fragmentation
  • orphaned memory
  • NULL pointer
  • ADT destructor
  • ADT constructor
  • scope
  • memory leak
  • dynamic memory
  • static memory
  • memory lifetime
  • memory reuse

Unit 5: Linked Stacks, Queues, and Lists

5a. Show how lists, stacks, and queues are used to implement specific requirements

  1. What is a linked-list?
  2. How does a linked-list differ from a vector list?
  3. In what way does list implementation impact the list's abstract functionality?
  4. What is the basic functionality that has to be added to a list to create a queue?
  5. What is the basic functionality that has to be added to a list to create a stack?
  6. What are requirements specifically suitable for stacks?
  7. What are requirements specifically suitable for queues?

We continue our discussion of linked-lists on pages 103-120 of this book. Linked stacks and queues are built from linked-lists. Nodes (objects, elements) are the linked components that point to each other within linked-lists, as described in this article. Nodes are formed from structures or classes as objects. The objects generally contain pointers to other objects of the same class. However, it is not necessarily the case that all nodes of a linked-list are of the same class. Keep that generalization of the concept behind linked-lists in mind.


5b. Construct software to employ lists, stacks, and queues within larger applications

  1. What are singly-linked-lists?
  2. What are doubly-linked-lists?
  3. How does a singly-linked-list differ from a doubly-linked-list?
  4. If a stack or queue were implemented as a doubly-linked-list, what might its advantages be?
  5. What are applications within which stacks could be employed?
  6. What are applications within which queues could be employed?

Begin your review by exploring singly-linked and doubly-linked-lists. Selecting the particular implementation within applications is important since it is not unusual for the basic stack and queue functionalities to be insufficient. Be sure you understand linked stacks on pages 123-125 of this book, and their implementation, as described in this article. The same goes for linked queues (on page 133) and their implementation, as described here. Examine, build, and run these code examples using your favorite IDE.


Unit 5 Vocabulary

This vocabulary list includes terms that might help you with the review items above and some terms you should be familiar with to be successful in completing the final exam for the course.

Try to think of the reason why each term is included.

  • pointer
  • linked-list
  • singly-linked
  • doubly-linked
  • stack
  • queue
  • dynamic memory allocation
  • pointer to next
  • pointer to previous
  • head
  • current
  • tail
  • push
  • pop
  • insert
  • LIFO
  • FIFO

Unit 6: Algorithm Efficiency

6a. Examine algorithms relative to their resource utilization

  1. What are the various resources consumed when a program is running?
  2. Why is it important to balance resource utilization when examining program efficiency?
  3. What is an effective empirical approach to resource analysis?
  4. Why are considerations of algorithm efficiency are important?
  5. What are the classical equations that describe the efficiency of various resource utilizations?
  6. Why does one better concentrate on the underlying efficiency equations rather than their magnitude?

An understanding of algorithm efficiency is part of creating effective applications. Realizing their importance (as described in this video from 31:30 to the end), you can guide algorithm development to achieve the most satisfying results. Many applications depend on code modules running within a given length of time, even if the application only requires soft-real-time. A major topic within this domain is Big-O analysis. This is a means of discovering an algorithm implementation's efficiency measure. Most discussion in this arena focuses on runtime. However, you must also consider other resources. An example is the tradeoff between memory and time. For distributed systems, you must also consider the volume of data on the network.

 

Unit 6 Vocabulary

This vocabulary list includes terms that might help you with the review items above and some terms you should be familiar with to be successful in completing the final exam for the course.

Try to think of the reason why each term is included

  • Big-O analysis
  • cO(f(n))
  • efficiency vs. effectiveness
  • space/time tradeoff
  • constant, linear, logarithmic, quadratic resource consumption growth rates
  • difference between best, worst, average cases
  • resource utilization growth with increasing data loads
  • hardware/software tradeoffs
  • upper bound
  • lower bound
  • empirical analysis

Unit 7: Searching and Sorting Algorithms

7a. Understand common search methods

  1. Which are the commonly-used search methods?
  2. What is a process for determining which search method is best for a given application?
  3. In what way does data search depend on data sorting?

We began our discussion of search by using vectors for illustration. Do not forget that linked-lists can be used as vectors. First, however, you should consider the algorithm's efficiency and effectiveness. In our study of search, we concentrated on linear, binary, and Fibonacci searches. While linear search is the least efficient in the Big-O sense, the data does not have to be pre-sorted. Plus, it is a very good approach when there is only a small volume of data.


7b. Understand common sort methods

  1. Which are the commonly-used sorting methods?
  2. What is a process for determining which sorting method is best for a given application?
  3. In what way does search impact sorting efficiency?

The ability to sort data allows us to organize it for various uses and search for specific items. We spent a lot of time on divide-and-conquer methods, especially merge sort. Divide-and-conquer allows for the best use of multi-core and multi-node networked systems when large data volumes are involved. Besides merge sort, we also discussed insertion sort; bubble and selection sorts; and quicksort (review this video until 18:44).


7c. Plan efficient and effective search and sort algorithms

  1. Why is it necessary to balance efficiency and effectiveness?
  2. Under what circumstances do we see similar resource utilization for all search or sort methods?
  3. Do the units of efficiency always have to do only with the computer itself? Explain.

Review the difference between recursive and nonrecursive implementations of quicksort. You will notice that the recursive version is much easier to understand. That does have value. However, it is difficult to determine a recursive implementation's memory requirements. For many modern systems, especially embedded systems, you cannot assume the availability of sufficient memory to accommodate whatever is required. From another perspective, cloud-memory is typically far more expensive than runtime. You must be able to calculate a cost-budget rather than assume whatever money is needed will be available. That simply will not work in commercial situations.


7d. Judge searching and sorting algorithms relative to their application efficiency

  1. What are the average runtime O(f(n)) of merge, insertion, binary, bubble, selection, and quicksort?
  2. What are the average runtime O(f(n)) of linear, binary, and Fibonacci search methods?
  3. State means of carrying out Big-O analysis of various approaches to search and sort.

There are general approaches (described on pages 55-85) to analyzing an algorithm's efficiency. To see two good examples, look at Big-O analyses of merge sort and insertion sort.


7e. Rate search and sort algorithms relative to an application's needs

  1. How would one balance ease of system integration with O(f(n)) for a given algorithm's implementation?
  2. Why care about code warnings, memory leaks, and memory fragmentation if the implementation "works"?
  3. For what reason might we not care about a resource's average O(f(n)) when choosing an algorithm?

Having studied search and sort algorithms to determine their average resource O(f(n)), it is time now to look at specific implementations. It is true that an algorithm is not the same as the algorithm's implementation, although many people will hand you a piece of code and refer to it as an algorithm. Always review the code carefully. I like to use static and dynamic review tools since manual approaches can miss subtle problems. Do not be fooled by people telling you that code warnings do not matter. Not only do they obscure problems, but they also have unrecognized interactions that can cause system failures. For this particular subunit, review the code-level expressions of merge, insertion, bubble, and selection sorting algorithms. You should also review the code for binary search.


Unit 7 Vocabulary

This vocabulary list includes terms that might help you with the review items above and some terms you should be familiar with to be successful in completing the final exam for the course.

Try to think of the reason why each term is included.

  • merge sort
  • insertion sort
  • bubble sort
  • selection sort
  • quicksort
  • linear search
  • binary search
  • Fibonacci search
  • O(f(n))
  • Big-O analysis
  • divide-and-conquer
  • recursive
  • nonrecursive
  • worst, average, best O(f(n)) case

Unit 8: Hash Tables, Graphs, and Trees

8a. Demonstrate effective hash tables

  1. How does hashing determine the position of an object/element in the data storage space?
  2. What are the means of handling collisions?
  3. Is it necessary for the hash key to be unique across all key/data pairs?
  4. In what way does the statistical distribution of key values affect hashing effectiveness?
  5. How can data search be a part of a hashing capability?
  6. How can data sorting be part of a hashing capability?

Hashing goes beyond searching and sorting to calculate the position of a key/value pair in a data storage space. No system is ever perfect. In hashing, it is possible that two different key/value pairs will try to occupy the same location in the data storage space. There are numerous ways to handle that situation, as discussed in this lecture and this follow-on lecture. Other approaches are described here on pages 324-345, with a focus on practical implementation.

 

8b. Demonstrate effective graphs

  1. What is the difference between a graph and a tree?
  2. How is the efficiency and effectiveness of a graph traversal algorithm related to a graph's topology?
  3. In what ways may a graph be represented by a computer program?
  4. Which are the two most common ways to visit all nodes in a graph?
  5. Which are the two most commonly-required graph paths?

In computer science, the term "graph" refers to a set of vertices (points or nodes) and of edges (lines) that connect the vertices. The creation and traversal of graphs is an important topic when learning to solve complex problems that involve numerous relationships. For an excellent discussion on graphs and graph traversal, read Chapter 11 of this open-access text. As in all things, we need efficiency and effectiveness. Both are defined relative to the application at hand, although it is true that O(f(n)) can be estimated in general for various paradigms. We studied various graph traversals. Each is useful for different applications: finding non-cyclical paths, depth-first search, breadth-first search, lowest-cost path, minimal spanning tree.

 

8c. Demonstrate effective trees

  1. What is the difference between binary and non-binary trees?
  2. What is the difference between partial and complete trees?
  3. How might one compare trees to graphs?
  4. What are the basic elements of a tree?
  5. What are the two major approaches to visiting every node of a tree?
  6. How are infix and postfix notations related to tree-building and traversal?

Trees are a special case of graphs. A tree is a graph whose nodes do not have multiple references. Plus, tree traversal is only in one direction, downward, from root (top) to leaves (bottom). Review this page for basic definitions and illustrations. Many applications can be satisfied by binary trees (review chapter 5). However, do not be satisfied with being able to respond to common requests. A deeper understanding of trees enables solutions to a wider range of requirements. Continue your study of trees by reviewing non-binary trees (review chapter 6).

 

Unit 8 Vocabulary

This vocabulary list includes terms that might help you with the review items above and some terms you should be familiar with to be successful in completing the final exam for the course.

Try to think of the reason why each term is included.

Relative to Hashing

  • hashing
  • hash function
  • hash table
  • key/value pair
  • collision
  • chaining
  • open addressing
  • bucket
  • overflow
  • linear probing
  • load factor

Relative to Graphs

  • graph
  • digraph
  • labeled graph
  • weighted graph
  • vertex
  • path
  • depth-first
  • breadth-first
  • spanning tree
  • verticy
  • edge
  • undirected graph
  • adjacent
  • simple path
  • cycle
  • simple cycle
  • subgraph
  • acyclic graph
  • traversal
  • shortest path
  • minimal spanning tree
  • lowest-cost path

Relative to Trees

  • root
  • leaf
  • node
  • branch
  • parent
  • child
  • level
  • ancestor
  • descendent
  • depth
  • height
  • size
  • incomplete tree
  • complete tree
  • infix
  • postfix
  • red/black tree
  • binary search tree
  • Huffman coding tree
  • subtree
  • depth-first
  • width-first
  • post-order
  • pre-order