Data Structures and Types

pandas.Series

The Series object holds data from a single input variable and is required, much like numpy arrays, to be homogeneous in type. You can create Series objects from lists or numpy arrays quite easily

s = pd.Series([1,3,5,np.nan, 9, 13])
s
0     1.0
1     3.0
2     5.0
3     NaN
4     9.0
5    13.0
dtype: float64
s2 = pd.Series(np.arange(1,20))
s2
0      1
1      2
2      3
3      4
4      5
5      6
6      7
7      8
8      9
9     10
10    11
11    12
12    13
13    14
14    15
15    16
16    17
17    18
18    19
dtype: int64

You can access elements of a Series much like a dict

 s2[4]
 5

There is no requirement that the index of a Series has to be numeric. It can be any kind of scalar object

s3 = pd.Series(np.random.normal(0,1, (5,)), index = ['a','b','c','d','e'])
s3
a   -0.283473
b    0.157530
c    1.051739
d    0.859905
e    1.178951
dtype: float64
s3['d']
0.859904696094078
s3['a':'d']
a   -0.283473
b    0.157530
c    1.051739
d    0.859905
dtype: float64

Well, slicing worked, but it gave us something different than expected. It gave us both the start and end of the slice, which is unlike what we've encountered so far!!

It turns out that in pandas, slicing by index actually does this. It is a discrepancy from numpy and Python in general that we have to be careful about.

You can extract the actual values into a numpy array

 s3.to_numpy()
 array([-0.28347282,  0.1575304 ,  1.05173885,  0.8599047 ,  1.17895111])

In fact, you'll see that much of pandas' structures are built on top of numpy arrays. This is a good thing since you can take advantage of the powerful numpy functions that are built for fast, efficient scientific computing.

Making the point about slicing again,

 s3.to_numpy()[0:3] 
 array([-0.28347282,  0.1575304 ,  1.05173885])

This is different from index-based slicing done earlier.