numpy for Numerical and Scientific Computing

As you approach the end of this introduction to constructing arrays in numpy, read this section to familiarize yourself with various numpy methods used throughout the course. Specifically, the need for methods such as shape, size, linspace, reshape, eye, and zeros often arise when manipulating arrays.

Vectors and matrices

Numpy generates arrays, which can be of arbitrary dimension. However the most useful are vectors (1-d arrays) and matrices (2-d arrays).

In these examples, we will generate samples from the Normal (Gaussian) distribution, with mean 0 and variance 1.

A = rng.normal(0,1,(4,5))

We can compute some characteristics of this matrix's dimensions. The number of rows and columns is given by shape.

A.shape
(4, 5)

The total number of elements is given by size.

A.size
20

If we want to create a matrix of 0's with the same dimensions as A, we don't actually have to compute its dimensions. We can use the zeros_like function to figure it out.

np.zeros_like(A)
array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

We can also create vectors by only providing the number of rows to the random sampling function. The number of columns will be assumed to be 1.

B = rng.normal(0, 1, (4,))
 B
array([-0.45495378,  1.04307172,  0.70451207, -0.6171649 ])


Extracting elements from arrays

The syntax for extracting elements from arrays is almost exactly the same as for lists, with the same rules for slices.

Exercise: State what elements of B are extracted by each of the following statements

B[:3]
 B[:-1]
 B[[0,2,4]]
 B[[0,2,5]]

For matrices, we have two dimensions, so you can slice by rows, columns, or both.

A
array([[-2.45677354,  0.36686697, -0.20453263, -0.54380446,  0.09524207],
       [ 1.06236144,  1.03937554,  0.01247733, -0.35427727, -1.18997812],
       [ 0.95554288,  0.30781478,  0.7328766 , -1.28670314, -1.03870027],
       [-0.81398211, -1.02506031,  0.12407205,  1.21491023, -1.44645123]])

We can extract the first column by specifying : (meaning everything) for the rows and the index for the column (reminder, Python starts counting at 0)

A[:,0]
array([-2.45677354,  1.06236144,  0.95554288, -0.81398211])

Similarly, the 4th row can be extracted by putting the row index, and : for the column index.

A[3,:]
array([-0.81398211, -1.02506031,  0.12407205,  1.21491023, -1.44645123])

All slicing operations work for rows and columns

A[:2,:2]
array([[-2.45677354,  0.36686697],
       [ 1.06236144,  1.03937554]])


Array operations

We can do a variety of vector and matrix operations in numpy.

First, all usual arithmetic operations work on arrays, like adding or multiplying an array with a scalar.

A = rng.randn(3,5)
A
array([[-0.15367796,  2.50215522,  0.19420725,  0.54928294, -1.1737166 ],
       [ 1.11456557,  0.07447758,  1.58518354,  1.61986225, -0.24616333],
       [-0.02682273,  0.2196577 ,  0.41680351, -0.86319929,  0.50355595]])
A + 10
array([[ 9.84632204, 12.50215522, 10.19420725, 10.54928294,  8.8262834 ],
       [11.11456557, 10.07447758, 11.58518354, 11.61986225,  9.75383667],
       [ 9.97317727, 10.2196577 , 10.41680351,  9.13680071, 10.50355595]])

We can also add and multiply arrays element-wise as long as they are the same shape.

B = rng.randint(0,10, (3,5))
B
array([[6, 2, 3, 9, 8],
       [5, 9, 3, 9, 7],
       [0, 4, 2, 5, 0]])
A + B
array([[ 5.84632204,  4.50215522,  3.19420725,  9.54928294,  6.8262834 ],
       [ 6.11456557,  9.07447758,  4.58518354, 10.61986225,  6.75383667],
       [-0.02682273,  4.2196577 ,  2.41680351,  4.13680071,  0.50355595]])
A * B
array([[-0.92206775,  5.00431043,  0.58262175,  4.94354649, -9.38973278],
       [ 5.57282784,  0.67029821,  4.75555061, 14.57876027, -1.72314331],
       [-0.        ,  0.8786308 ,  0.83360701, -4.31599644,  0.        ]])

You can also do matrix multiplication. Recall what this is.

If you have a matrix A_{mxn} and another matrix B_{nxp}, as long as the number of columns of A and rows of B are the same, you can multiply them (C_{mxp}=A_{mxn}B_{nxp}), with the (i,j)-th element of C being

c_{ij}=∑^n_{k=1} a_{ik}b_{kj}, i=1,…,m; j = 1, …, p

In numpy the operant for matrix multiplication is @.

In the above examples, A and B cannot be multiplied since they have incompatible dimensions. However, we can take the transpose of B, i.e., flip the rows and columns to make them compatible with A for matrix multiplication.

A @ np.transpose(B)
array([[ 0.21867814, 19.0611592 , 13.14345008],
       [24.20135281, 23.85429363, 11.56758865],
       [-2.21155643, -1.15068575, -2.60375863]])
np.transpose(A) @ B
array([[  4.65076009,   9.61644327,   2.82901737,   8.51387483,
          6.57253531],
       [ 15.38531919,   6.55323945,   8.16921379,  24.28798365,
         20.53858478],
       [  9.09116118,  16.32228035,   6.17177937,  18.0985346 ,
         12.64994275],
       [ 11.39500892,  12.22452901,   4.78103701,  15.20631032,
         15.73329932],
       [ -8.27311623,  -2.54867934,  -3.25252787, -10.26113957,
        -11.11287609]])

More generally, you can reshape a numpy array into a new shape, provided it is compatible with the number of elements in the original array.

D = rng.randint(0,5, (4,4))
 D
array([[0, 2, 0, 0],
       [4, 0, 0, 4],
       [0, 3, 2, 0],
       [3, 0, 0, 3]])
D.reshape(8,2)
array([[0, 2],
       [0, 0],
       [4, 0],
       [0, 4],
       [0, 3],
       [2, 0],
       [3, 0],
       [0, 3]])
D.reshape(1,16)
array([[0, 2, 0, 0, 4, 0, 0, 4, 0, 3, 2, 0, 3, 0, 0, 3]])

This can also be used to cast a vector into a matrix.

e = np.arange(20)
 E = e.reshape(5,4)
 E
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19]])

One thing to note in all the reshaping operations above is that the new array takes elements of the old array by row. See the examples above to convince yourself of that.


Statistical operations on arrays

You can sum all the elements of a matrix using sum. You can also sum along rows or along columns by adding an argument to the sum function.

A = rng.normal(0, 1, (4,2))
 A
array([[-1.60798054, -0.05162306],
       [-0.49218049, -0.1262316 ],
       [ 0.56927597,  0.05438786],
       [ 0.33120322, -0.81820729]])
A.sum()
-2.141355938226197

You can sum along rows (i.e., down columns) with the option axis = 0

A.sum(axis=0)
array([-1.19968185, -0.94167409])

You can sum along columns (i.e., across rows) with axis = 1.

A.sum(axis=1)
array([-1.6596036 , -0.61841209,  0.62366383, -0.48700407])

Of course, you can use the usual function calls: np.sum(A, axis = 1)

We can also find the minimum and maximum values.

A.min(axis = 0)
array([-1.60798054, -0.81820729])
A.max(axis = 0)
array([0.56927597, 0.05438786])

We can also find the position where the minimum and maximum values occur.

A.argmin(axis=0)
array([0, 3])
A.argmax(axis=0)
array([2, 2])

We can sort arrays and also find the indices, which will result in the sorted array. I'll demonstrate this for a vector, where it is more relevant

a = rng.randint(0,10, 8)
 a
array([9, 2, 6, 6, 4, 4, 3, 4])
np.sort(a)
array([2, 3, 4, 4, 4, 6, 6, 9])
np.argsort(a)
array([1, 6, 4, 5, 7, 2, 3, 0])
a[np.argsort(a)]
array([2, 3, 4, 4, 4, 6, 6, 9])

np.argsort can also help you find the 2nd smallest or 3rd largest value in an array, too.

ind_2nd_smallest = np.argsort(a)[1]
 a[ind_2nd_smallest]
3
ind_3rd_largest = np.argsort(a)[-3]
 a[ind_3rd_largest]
6

You can also sort strings in this way.

m = np.array(['Aram','Raymond','Elizabeth','Donald','Harold'])
 np.sort(m)
array(['Aram', 'Donald', 'Elizabeth', 'Harold', 'Raymond'], dtype='<U9')

If you want to sort arrays in place, you can use the sort function in a different way.

m.sort()
 m
array(['Aram', 'Donald', 'Elizabeth', 'Harold', 'Raymond'], dtype='<U9')


Putting arrays together

We can put arrays together by row or column, provided the corresponding axes have compatible lengths.

A = rng.randint(0,5, (3,5))
 B = rng.randint(0,5, (3,5))
 print('A = ', A)
A =  [[3 4 2 1 3]
 [0 3 1 1 1]
 [4 0 2 0 4]]
print('B = ', B)
B =  [[1 4 2 1 3]
 [2 0 3 2 0]
 [4 0 2 3 3]]
np.hstack((A,B))
array([[3, 4, 2, 1, 3, 1, 4, 2, 1, 3],
       [0, 3, 1, 1, 1, 2, 0, 3, 2, 0],
       [4, 0, 2, 0, 4, 4, 0, 2, 3, 3]])
np.vstack((A,B))
array([[3, 4, 2, 1, 3],
       [0, 3, 1, 1, 1],
       [4, 0, 2, 0, 4],
       [1, 4, 2, 1, 3],
       [2, 0, 3, 2, 0],
       [4, 0, 2, 3, 3]])

Note that both hstack and vstack take a tuple of arrays as input.


Logical/Boolean operations

You can query a matrix to see which elements meet some criterion. In this example, we'll see which elements are negative.

A < 0
array([[False, False, False, False, False],
       [False, False, False, False, False],
       [False, False, False, False, False]])

This is called masking and is useful in many contexts.

We can extract all the negative elements of A using

A[A<0]
array([], dtype=int64)

This forms a 1-d array. You can also count the number of elements that meet the criterion

np.sum(A<0)
0

Since the entity A<0 is a matrix as well, we can do row-wise and column-wise operations as well.