numpy for Numerical and Scientific Computing

As you approach the end of this introduction to constructing arrays in numpy, read this section to familiarize yourself with various numpy methods used throughout the course. Specifically, the need for methods such as shape, size, linspace, reshape, eye, and zeros often arise when manipulating arrays.

Numpy data types

Numpy arrays are homogeneous in type.

np.array(['a','b','c'])
array(['a', 'b', 'c'], dtype='<U1')
np.array([1,2,3,6,8,29])
array([ 1,  2,  3,  6,  8, 29])

But, what if we provide a heterogeneous list?

y = [1,3,'a']
 np.array(y)
array(['1', '3', 'a'], dtype='<U21')

So what's going on here? Upon conversion from a heterogeneous list, numpy converted the numbers into strings. This is necessary since, by definition, numpy arrays can hold data of a single type. When one of the elements is a string, numpy casts all the other entities into strings as well. Think about what would happen if the opposite rule was used. The string 'a' doesn't have a corresponding number, while both numbers 1 and 3 have corresponding string representations, so going from string to numeric would create all sorts of problems.

The advantage of numpy arrays is that the data is stored in a contiguous section of memory, and you can be very efficient with homogeneous arrays in terms of manipulating them, applying functions, etc. However, numpy does provide a "catch-all" dtype called object, which can be any Python object. This dtype essentially is an array of pointers to actual data stored in different parts of the memory. You can get to the actual objects by extracting them. So one could do

np.array([1,3,'a'], dtype='object')
array([1, 3, 'a'], dtype=object)

which would basically be a valid numpy array, but would go back to the actual objects when used, much like a list. We can see this later if we want to transform a heterogeneous pandas DataFrame into a numpy array. It's not particularly useful as is, but it prevents errors from popping up during transformations from pandas to numpy.