Data Structures in Python

Python has several built-in data structures. Lists allow you to create a collection of items that can be referenced using an index, and you can modify their values (they are "mutable"). Tuples are similar to lists except that, once created, you cannot modify their values (they are "immutable"). Dictionaries allow you to create a collection of items that can be referenced by a key (as opposed to an index). Use this section to learn about and practice programming examples involving lists, tuples, and dictionaries.

Python has several in-built data structures. We'll describe the three most used ones:

  1. Lists( [] )
  2. Tuples(() )
  3. Dictionaries or dicts( {} )

Note that there are three different kinds of brackets being used.

Lists are baskets that can contain different kinds of things. They are ordered so that there is a first element, a second element, and a last element, in order. However, the kinds of things in a single list don't have to be the same type.

Tuples are basically like lists, except that they are immutable, i.e., once they are created, individual values can't be changed. They are also ordered, so there is a first element, a second element, and so on.

Dictionaries are unordered key-value pairs, which are very fast for looking up things. They work almost like hash tables. Dictionaries will be very useful to us as we progress towards the PyData stack. Elements need to be referred to by key, not by position.


Lists

 test_list = ["apple", 3, True, "Harvey", 48205] 
  test_list 
 ['apple', 3, True, 'Harvey', 48205] 

There are various operations we can do on lists. First, we can determine the length(or size) of the list

 len( test_list) 
 5 

The list is a catch-all, but we're usually interested in extracting elements from the list. This can be done by position since lists are ordered. We can extract the 1st element of the list using

 test_list [0] 
 'apple' 

Wait!! The index is 0?

Yup. Python is based deep underneath on the C language, where counting starts at 0. So the first element has index 0, second has index 1, and so on. So you need to be careful if you're used to counting from 1, or, if you're used to R, which does start counting at 1.

We can also extract a set of consecutive elements from a list, which is often convenient. The typical form is to write the index as a: b . The (somewhat confusing) rule is that a: b means that you start at index a , but continue until before index b . So the notation 2: 5 means include elements with index 2, 3, and 4. In the Python world, this is called slicing.

 test_list [2: 5] 
 [True, 'Harvey', 48205] 

If you want to start at the beginning or go to the end, there is a shortcut notation. The same rule holds, though. : 3 does not include the element at index 3, but 2: does include the element at index 2.

 test_list [: 3] 
 ['apple', 3, True] 
 test_list [2:] 
 [True, 'Harvey', 48205] 

The important thing here is if you provide an index a: b , then a is include but b is not.

You can also count backward from the end. The last element in a Python list has index - 1 .

index 0 1 2 3 4
element 'apple' 3 True 'Harvey' 48205
counting backward -5 -4 -3 -2 -1
 test_list [- 1] 
 48205 

You can also use negative indices to denote sequences within the list, with the same indexing rule applying. Note that you count from the last element(-1) and go backward.

 test_list [:- 1] 
 ['apple', 3, True, 'Harvey'] 
 test_list [- 3:] 
 [True, 'Harvey', 48205] 
 test_list [- 3:- 1] 
 [True, 'Harvey'] 

You can also make a list of lists or nested lists

 test_nested_list = [[1, "a", 2, "b"], [3, "c", 4, "d"]] 
  test_nested_list 
 [[1, 'a', 2, 'b'], [3, 'c', 4, 'd']] 

This will come in useful when we talk about arrays and data frames.

You can also check if something is in the list, i.e. is a member.

 "Harvey" in test_list 
 True 

Lists have the following properties

  • They can be heterogenous(each element can be a different type)
  • Lists can hold complex objects(lists, dicts, other objects) in addition to atomic objects(single numbers or words)
  • List have an ordering, so you can access list elements by position
  • List access can be done counting from the beginning or the end, and consecutive elements can be extracted using slices.


Tuples

Tuples are like lists, except that once you create them, you can't change them. This is why tuples are great if you want to store fixed parameters or entities within your Python code since they can't be over-written even by mistake. You can extract elements of a tuple, but you can't over-write them. This is called immutable.

Note that, like lists, tuples can be heterogenous, which is also useful for coding purposes, as we will see.

 test_tuple =( "apple", 3, True, "Harvey", 48205) 
 test_tuple [: 3] 
 ('apple', 3, True) 
 test_list [0] = "pear" 
  test_list 
 ['pear', 3, True, 'Harvey', 48205] 

See what happens in the next bit of code

 test_tuple [0] = "pear" 
  test_tuple 

(I'm not running this since it gives an error)

Tuples are like lists, but once created, they cannot be changed. They are ordered and can be sliced.


Dictionaries

Dictionaries, or dict , are collections of key-value pairs. Each element is referred to by key, not by index. In a dictionary, the keys can be strings, numbers, or tuples, but the values can be any Python object. So you could have a dictionary where one value is a string, another is a number and a third is a DataFrame (essentially a data set, using the pandas library). A simple example might be an entry in a list of contacts

 contact = { 
  "first_name": "Abhijit", 
  "last_name": "Dasgupta", 
  "Age": 48, 
  "address": "124 Main St", 
  "Employed": True, 
  } 

Note the special syntax. You separate the key-value pairs by colons(: ), and each key-value pair is separated by commas. If you get a syntax error creating a dict, look at these first.

If you try to get the first name out using an index, you run into an error:

 contact [0] 
 Error in py_call_impl( callable, dots$args, dots$keywords): KeyError: 0 

  Detailed traceback: 
  File "<string>", line 1, in <module> 

You need to extract it by key

 contact ["first_name"] 
 'Abhijit' 

A dictionary is mutable, so you can change the value of any particular element

 contact ["address"] = "123 Main St" 
  contact ["Employed"] = False 
  contact 
 { 'first_name': 'Abhijit', 'last_name': 'Dasgupta', 'Age': 48, 
     'address': '123 Main St', 'Employed': False } 

You can see all the keys and values in a dictionary using extractor functions

 contact . keys() 
 dict_keys(['first_name', 'last_name', 'Age', 'address', 'Employed']) 
 contact . values() 
 dict_values(['Abhijit', 'Dasgupta', 48, '123 Main St', False]) 

It turns out that dictionaries are really fast in terms of retrieving information, without having to count where an element is. So it is quite useful

We'll see that dictionaries are also one way to easily create pandas DataFrame objects on the fly.

There are a couple of other ways to create dict objects. One is using a list of tuples. Each key-value pair is represented by a tuple of length 2, where the 1st element is the key and the second element is the value.

 A = [( 'first_name', 'Abhijit'),( 'last_name', 'Dasgupta'),( 'address', '124 Main St')] 
  dict( A) 
 { 'first_name': 'Abhijit', 'last_name': 'Dasgupta', 'address': '124 Main St' } 

This actually can be utilized to create a dict from a pair of lists. There is a really neat function, zip , that inputs several lists of the same length and creates a list of tuples, where the i-th element of each tuple comes from the i-th list, in order.

 A = ['first_name', 'last_name', 'address'] 
  B = ['Abhijit', 'Dasgupta', '124 Main St'] 
  dict( zip( A, B)) 
 { 'first_name': 'Abhijit', 'last_name': 'Dasgupta', 'address': '124 Main St' } 

The zip function is quite powerful in putting several lists together with corresponding elements of each list into a tuple

On a side note, there is a function defaultdict from the collections module that is probably better to use. We'll come back to it when we talk about modules.


Source: Abhijit Dasgupta, https://www.araastat.com/BIOF085/a-python-primer.html#data-structures-in-python
Creative Commons License This work is licensed under a Creative Commons Attribution 4.0 License.

Last modified: Tuesday, September 20, 2022, 6:27 PM