CS250: Python for Data Science

Topic outline

Collapse all Expand all
Time: 67 hours

Free Certificate

This course attempts to strike a balance between presenting the vast set of methods within the field of data science and Python programming techniques for implementing them. Problem-solving and programming implementation will be emphasized throughout the course. All techniques presented will be introduced using real-world programming examples. A major goal of the course is to ensure that when you finish the course, you will have the programming and conceptual expertise you need to join the field of data science.

Several Python modules such as pandas, scikit-learn, scipy.stats, and statsmodels will be introduced that are useful for data analysis, data visualization, and data mining. The course will gradually shift from introductory topics such as a review of Python, matrix operations, and statistics to applications and implementing programs involving data mining, visualization, statistical models, and time series analysis.
- Select activity Enroll me in this course
  
  Enroll me in this course
- Select activity Course Syllabus
  
  Course Syllabus Page

Course Syllabus

Sorry, this activity is currently hidden

Sorry, this activity is currently hidden

Topic outline

Course Introduction

Course Syllabus

Unit 1: What is Data Science?

1.1: Introduction to Data Science

1.2: How Data Science Works

1.3: Important Facets of Data Science

Unit 1 Assessment

Unit 2: Python for Data Science

2.1: Google Colaboratory

2.2: Datatypes, Operators, and the math Module

2.3: Control Statements, Loops, and Functions

2.4: Lists, Tuples, Sets, and Dictionaries

2.5: The random Module

2.6: The matplotlib Module

Unit 2 Assessment

Unit 3: The numpy Module

3.1: Constructing Arrays

3.2: Indexing

3.3: Array Operations

3.4: Saving and Loading Data

Unit 3 Assessment

Unit 4: Applied Statistics in Python

4.1: Basic Statistical Measures and Distributions

4.2: Random Numbers in numpy

4.3: The scipy.stats Module

4.4: Data Science Applications

Unit 4 Assessment

Unit 5: The pandas Module

5.1: Dataframes

5.2: Data Cleaning

5.3: pandas Operations: Merge, Join, and Concatenate

5.4: Data Input and Output

5.5: Visualization Using the pandas Module

Unit 5 Assessment

Unit 6: Visualization

6.1: The seaborn Module

6.2: Advanced Data Visualization Techniques

6.3: Data Science Applications

Unit 6 Assessment

Unit 7: Data Mining I – Supervised Learning

7.1: Data Mining Overview

7.2: Supervised Learning

7.3: Principal Component Analysis

7.4: k-Nearest Neighbors

7.5: Decision Trees

7.6: Logistic Regression

7.7: Training and Testing

Unit 7 Assessment

Unit 8: Data Mining II – Clustering Techniques

8.1: Unsupervised Learning

8.2: K-means Clustering

8.3: Hierarchical Clustering

8.4: Training and Testing

Unit 8 Assessment

Unit 9: Data Mining III – Statistical Modeling

9.1: Linear Regression

9.2: Residuals

9.3: Overfitting

9.4: Cross-Validation

Unit 9 Assessment

Unit 10: Time Series Analysis

10.1: The statsmodels Module

10.2: Autoregressive (AR) Models

10.3: Moving Average (MA) Models

10.4: Autoregressive Integrated Moving Average (ARIMA) Models

Unit 10 Assessment

Study Guide

Course Feedback Survey

Certificate Final Exam

Sorry, this activity is currently hidden

Sorry, this activity is currently hidden

Topic outline