Thinking about the World

There are two major approaches to data science: analytical mathematics (including statistics) and visualization. These two categories are not mutually exclusive. However, mathematical analysis would be considered more of a "left-brain"' approach, while visualization would reflect a more "right-brain" approach. Both are powerful approaches for analyzing data, and we should not choose one or exclude the other. Visualization is a sensible vehicle for introducing the field because data relationships become immediately apparent to the naked eye. Use the materials in this section to compare and contrast analytic approaches versus visualization approaches. In this course, we will try to strike a healthy balance between the two.

Thinking Like a Mathematician

Chapter Summary

When a data scientist thinks like a mathematician, they think in terms of measurement and models. The tasks are to decompose the problem into it basic components; represent those components numerically; and combine the components together into an accurate expression of the problem and its solution.


Discussion

According to the Wikipedia, mathematics is the study of quantity, structure, space, and change. When these are used to solve practical problems it called applied mathematics. In addition to these main concerns, there are also topics dedicated to exploring links from the heart of mathematics to other fields: to logic, to set theory, and more recently to the study of uncertainty. For the purposes of the is book, we will not explore these last three aspects of mathematics.


Quantity

The study of quantity starts with numbers, first the familiar natural numbers and integers ("whole numbers") and the basic arithmetical operations on them, which are characterized in arithmetic. As the number system is further developed, the integers are recognized as a subset of the rational numbers ("fractions"). These, in turn, are contained within the real numbers, which are used to represent continuous quantities. Real numbers are generalized to complex numbers.

1 , 2 , 3 . . .  . . . − 2 , − 1 , 0 , 1 , 2 . . . − 2 ,\dfrac{ 2}{ 3} , 1.21 − e , \sqrt{2} , 3 , π 2 , i , − 2 + 3 i , 2e^{i \dfrac{4 π}{ 3}}
Natural numbers    
Integers
Rational numbers Real numbers     Complex numbers

When thinking like a mathematician, a data scientist needs to ask the questions, "how will the thing I am interested in be represented by numbers?" and "what kind of numbers will best represent the thing I am interested in?"

Structure

Many sets of mathematical objects exhibit internal structure. Mathematics exposes these structures by applying rules (axioms and operations) to the objects. Algebra is a powerful tool to understand mathematical structures. It combines the concept of variables with arithmetic to solve equations. Algebra is applied to many different, and seemingly, unrelated problems. Some of these problems include rings, groups, graphs, and fields.







Sets     Rings     Groups     Graphs 
Fields

When thinking like a mathematician, a data scientist needs to ask the questions, "what sort of internal structure does the thing I am interested in have?" and "what set of equations will expose the structure?"

Space

The study of space' originates with geometry, in particular, Euclidean geometry. Trigonometry is the branch of mathematics that deals with relationships between the sides and the angles of triangles; it combines space and numbers, and encompasses the well-known Pythagorean theorem. The advanced study of space includes include higher-dimensional geometry, non-Euclidean geometries, Differential Geometry, Topology, Fractal geometry, and Wikipedia:Measure theory. For the purposes of this book, we will not cover these more advanced geometries.








Geometry Trigonometry Differential geometry Topology Fractal geometry
Measure theor

When thinking like a mathematician, a data scientist needs to ask the questions, "does the thing I am interested have a spatial component, either actual or theoretical?" and "how do I capture and represent that spatial component?"

Change

Understanding and describing change is a common theme in science, and calculus was developed as a powerful tool to investigate it. Functions are a central concept describing a changing quantity. Many problems lead naturally to relationships between a quantity and its rate of change. That is, for a non-straight line, the slope is different at every point on the line. Understanding these changing slopes are studied in Differential calculus. Finding the areas under a curves is called Integral calculus. Calculus is beyond the scope of this book.



Tangent line at (x, f(x)). The derivative f′(x) of a curve at a point is the slope (rise over run) of the line tangent to that curve at that point.


Integration can be thought of as measuring the area S under a curve, defined by f(x), between two points (here a and b).

Thinking as a mathematician, the data scientist must ask, "does the relationship between the things I am interested change (over time or over distance)?" and "how will I describe the changing relationship?"


Applied Math

Applied mathematics concerns itself with mathematical methods that are typically used in science, engineering, business, and industry. Thus, "applied mathematics" is math with specialized knowledge. Generally speaking, this is the kind of math that Data Scientists practice.



Efficient solutions to the vehicle routing problem require tools from mathematics.
    


Modelling financial markets is done with mathematics.

Assignment/Exercise

This Project #2, which spans four chapters. Assemble into groups of 3 or 4 students. A group of three may not have the same members as the group for Project #1. A group of four may have no more than two students repeating from the group from the Project #1. This group will do the entire project together.

  1. Replicate Galileo's "inclined plane" experiment. Start by designing the research and write down your plan. List materials needed, specify methods to be used, identify variables to be measured, create data recording sheets, etc.
  2. Conduct the experiment according to the design. Take pictures. Record your data results.
  3. Enter the data into R. Use R to produce tables and draw plots of your data. See if you can draw the theoretical curve Galileo was trying to discover on your data plots.
  4. Prepare a slide presentation that includes a description of your methods, pictures of your apparatus, a table of your raw data, a table of your analyzed results, plots of your results, a list of several things the group learned on its own about data science during the course of this project.
Note: Your group can specialize on tasks, but everyone needs to participate in all phases of the assignment. Also, the chapters covered to this point do not teach you everything you need to know to do this assignment. Please do the best you can with what you know. This assignment is not just to show the instructor how much of the previous chapters you have learned, but the assignment is a learning experience in and of itself. The assignment is designed for the students to discover knowledge not contained in the chapters.