As you approach the end of this introduction to constructing arrays in numpy, read this section to familiarize yourself with various numpy methods used throughout the course. Specifically, the need for methods such as shape, size, linspace, reshape, eye, and zeros often arise when manipulating arrays.
Numpy data types
Numpy arrays are homogeneous in type.
np.array(['a','b','c'])
array(['a', 'b', 'c'], dtype='<U1')
np.array([1,2,3,6,8,29])
array([ 1, 2, 3, 6, 8, 29])
But, what if we provide a heterogeneous list?
y = [1,3,'a']
np.array(y)
array(['1', '3', 'a'], dtype='<U21')
So what's going on here? Upon conversion from a heterogeneous list, numpy converted the numbers into strings. This is necessary since, by definition, numpy arrays can hold data of a single type. When one of the elements is a string, numpy casts all the other entities into strings as well. Think about what would happen if the opposite rule was used. The string 'a' doesn't have a corresponding number, while both numbers 1 and 3 have corresponding string representations, so going from string to numeric would create all sorts of problems.
The advantage of numpy arrays is that the data is stored in a contiguous section of memory, and you can be very efficient with homogeneous arrays in terms of manipulating them, applying functions, etc. However,
numpy
does provide a "catch-all"dtype
calledobject
, which can be any Python object. Thisdtype
essentially is an array of pointers to actual data stored in different parts of the memory. You can get to the actual objects by extracting them. So one could do
np.array([1,3,'a'], dtype='object')
array([1, 3, 'a'], dtype=object)
which would basically be a valid
numpy
array, but would go back to the actual objects when used, much like a list. We can see this later if we want to transform a heterogeneouspandas
DataFrame
into anumpy
array. It's not particularly useful as is, but it prevents errors from popping up during transformations frompandas
tonumpy
.