Interview Query | Introduction - Data Science Interview

All areas of study in math can roughly be divided into two camps: discrete mathematics and continuous mathematics. Perhaps the best way to describe the difference between the two is to talk about what each of the branches means by “number.”

Discrete mathematics studies structures that are analogous to the integers, that is, the set of all whole numbers $\{\dots,-2,-1,0,1,2,\dots\}$ . The symbol $\mathbb{Z}$ is the most common symbol used to represent the integers (the notation comes from the German word “Zahlen”, meaning “number”). Continuous mathematics, on the other hand, studies structures that are analogous to the real numbers, denoted as $\mathbb{R}$ .

You might notice we did not give a list of real numbers like we did for integers. This was not a mistake. In fact, the inability of listing out the numbers in $\mathbb{R}$ is one of the defining features of $\mathbb{R}$ ! Indeed, if we even tried to begin listing out the real numbers in order, for example:

$\mathbb{R}\approx_?\left\{\dots,-9000,-\sqrt{2},\,0,\frac{1}{2},\pi,\dots\right\}$

we can always find a number that is not included in our “list” of real numbers. For example, the real number $(\pi-1)$ is not included in our “list” above.

In data science, numerical data can always be categorized in a similar way as either discrete or continuous. Rainfall, for example, would be a continuous variable since there is no well-defined next “step” after $1$ inch of rainfall. If we said the next “step” was $2$ inches, we preclude the possibility of $1.1$ inches. Likewise, if we say the next set is $1.1$ inches, we forget the possibility of a $1.01$ inches, and so on.

In contrast, the number of car crashes on a given day is a discrete value, since it doesn’t make sense to talk about $1.5$ car crashes. The number of car crashes must be a (positive) integer.

For this reason, traditional probability theory treats discrete and continuous variables in different, but somewhat analogous, ways. Technically, there is a more modern formulation of probability theory, which relies on an area of advanced mathematics called measure theory, that avoids the need to make this distinction. However, the formality and rigor gained from using measure-theoretic probability theory is not worth the cost in simplicity for most applications.

In this section, we will explore the foundations of how we model discrete data.

Poker Pair

Discrete Random Variables

Good job, keep it up!

0%

Completed

You have 255 sections remaining on this learning path.

Loading pricing options