• Unit 2: Python for Data Science

    This unit will introduce the Python IDE we will use in this course. We will also introduce installing Python modules relevant to upcoming units. The primary goals of this unit are to ensure that all required software is ready to run and to review the Python programming language.

    Developing expertise in data science requires understanding drawn from a breadth of different subjects such as numerical methods, matrix computations, statistics, data processing, visualization, data mining, and statistical modeling. Python is an excellent vehicle for developing this expertise because of the availability of various modules capable of addressing these topics. In this course, we will use the numpy module to introduce numerical methods and matrix computations. The pandas module (built upon the numpy module) will be used for data processing and visualization. Basic statistical calculations will be accomplished using scipy and pandas. Data rendering and visualization applications will be addressed using the matplotlib and seaborn. Finally, data mining and statistical modeling will be introduced using sckit-learn and statsmodels. Mastering data science in the context of these Python modules will position you to delve into deeper subjects such as machine learning and deep learning.

    You should leave this unit being able to write Python programs that can perform basic computational and data processing tasks. We will discuss core concepts such as data types, operators, functions, conditional statements, loops, and file handling. In addition, we will also give examples of Python data structures, such as lists and dictionaries, as well as basic object-oriented syntax. Understanding these data structures will enable you to implement basic plotting and data rendering instructions using the matplotlib module. This abbreviated (yet thorough) set of topics will serve as the programming vocabulary you will need to complete the course.

    Completing this unit should take you approximately 6 hours.

    • 2.1: Google Colaboratory

    • 2.2: Datatypes, Operators, and the math Module

    • 2.3: Control Statements, Loops, and Functions

    • 2.4: Lists, Tuples, Sets, and Dictionaries

    • 2.5: The random Module

    • 2.6: The matplotlib Module

    • Unit 2 Assessment

      • Receive a grade