numpy for Numerical and Scientific Computing

As you approach the end of this introduction to constructing arrays in numpy, read this section to familiarize yourself with various numpy methods used throughout the course. Specifically, the need for methods such as shape, size, linspace, reshape, eye, and zeros often arise when manipulating arrays.

Generating data in numpy

We had seen earlier how we could generate a sequence of numbers in a list using range. In numpy, you can generate a sequence of numbers in an array using arange (which actually creates the array rather than provide an iterator like range).

np.arange(10)
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

You can also generate regularly spaced sequences of numbers between particular values

np.linspace(start=0, stop=1, num=11) # or np.linspace(0, 1, 11)
array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. ])

You can also do this with real numbers rather than integers.

np.linspace(start = 0, stop = 2*np.pi, num = 10)
array([0.        , 0.6981317 , 1.3962634 , 2.0943951 , 2.7925268 ,
       3.4906585 , 4.1887902 , 4.88692191, 5.58505361, 6.28318531])

More generally, you can transform lists into numpy arrays. We saw this above for vectors. For matrices, you can provide a list of lists. Note the double [ in front and back.

np.array([[1,3,5,6],[4,3,9,7]])
array([[1, 3, 5, 6],
       [4, 3, 9, 7]])

You can generate an array of 0's

np.zeros(10)
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

This can easily be extended to a two-dimensional array (a matrix), by specifying the dimension of the matrix as a tuple.

np.zeros((10,10))
array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])

You can also generate a matrix of 1s in a similar manner.

np.ones((3,4))
array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

In matrix algebra, the identity matrix is important. It is a square matrix with 1's on the diagonal and 0's everywhere else.

np.eye(4)
array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]])

You can also create numpy vectors directly from lists, as long as lists are made up of atomic elements of the same type. This means a list of numbers or a list of strings. The elements can't be more composite structures, generally. One exception is a list of lists, where all the lists contain the same type of atomic data, which, as we will see, can be used to create a matrix or 2-dimensional array.

a = [1,2,3,4,5,6,7,8]
 b = ['a','b','c','d','3']

 np.array(a)
array([1, 2, 3, 4, 5, 6, 7, 8])
np.array(b)
array(['a', 'b', 'c', 'd', '3'], dtype='<U1')


Random numbers

Generating random numbers is quite useful in many areas of data science. All computers don't produce truly random numbers but generate pseudo-random sequences. These are completely deterministic sequences defined algorithmically that emulate the properties of random numbers. Since these are deterministic, we can set a seed or starting value for the sequence so that we can exactly reproduce this sequence to help debug our code. To actually see how things behave in simulations, we will often run several sequences of random numbers starting at different seed values.

The seed is set by the RandomState function within the random submodule of numpy. Note that all Python names are case-sensitive.

rng = np.random.RandomState(35) # set seed
 rng.randint(0, 10, (3,4))
array([[9, 7, 1, 0],
       [9, 8, 8, 8],
       [9, 7, 7, 8]])

We have created a 3x4 matrix of random integers between 0 and 10 (in line with slicing rules, this includes 0 but not 10).

We can also create a random sample of numbers between 0 and 1.

rng.random_sample((5,2))
array([[0.04580216, 0.91259827],
       [0.21381599, 0.3036373 ],
       [0.98906362, 0.1858815 ],
       [0.98872484, 0.75008423],
       [0.22238605, 0.14790391]])

We'll see later how to generate random numbers from particular probability distributions.