Back to Data Science Interview
Data Science Interview

Data Science Interview

91 of 257 Completed

Non-parametric Tests

So far, all of our tests have been parametric, meaning they assume that they make assumptions about the sample distribution. We assumed that the sample followed a normal distribution in the ZZ, tt, and FF tests. In the proportions and χ2\chi^2 tests, we assume that the samples follow a binomial or multinomial distribution, respectively.

But sometimes, particularly with small samples, it is not far to make these assumptions. For this reason, we have non-parametric tests that do not make any assumptions about the distribution of the sample.

While there are many, many non-parametric tests. We will go over two of the most popular ones: UU test and the paired signed-rank test.

Please note that there are ways to calculate pp-values for the test statistics of non-parametric tests, but we don’t describe how to do it here due to their esoteric nature.

UU Test

Cheat Sheet

  • Description: Tests if the median of two independent samples (say x1\vec{x}_1 and x2\vec{x}_2) are different/more than/less than the median of another sample
  • Statistic: m1m2m_1-m_2 (difference of medians)
  • Sidedness: Either
  • Null Hypothesis: H0:m1=m2H_0: m_1=m_2 (one-sided), m1m2,m1m2m_1\leq m_2,m_1\geq m_2 (two-sided)
  • Alternative Hypothesis: Ha:m1m2H_a: m_1\neq m_2 (one-sided), m1>m2,m1<m2m_1\gt m_2,m_1\lt m_2 (two-sided)
  • Test Statistic: U=min(U1,U2)U=\min(U_1,U_2) where U1=i=1n1j=1n2S(x1,i,x2,j),S(x,y)={1,if x>y0.5,if x=y0,if x<y U_1 = \sum_{i=1}^{n_1}\sum_{j=1}^{n_2}S(x_{1,i},x_{2,j}),\quad S(x,y)=\begin{cases}1,\quad\text{if} \ x\gt y\\\\ 0.5,\quad\text{if} \ x=y\\\\ 0,\quad\text{if} \ x\lt y\end{cases}

U2=i=1n1j=1n2S1(x1,i,x2,j),S1(x,y)={0,if x>y0.5,if x=y1,if x<y U_2 = \sum_{i=1}^{n_1}\sum_{j=1}^{n_2}S_{-1}(x_{1,i},x_{2,j}),\quad S _{-1}(x,y)=\begin{cases}0,\quad\text{if} \ x\gt y\\\\ 0.5,\quad\text{if} \ x=y\\\\ 1,\quad\text{if} \ x\lt y\end{cases}

Description

The idea behind this test is that U1U_1 and U2U_2 are proxies for P(X1>X2)\mathbb{P}(X_1\gt X_2). In fact, the hypotheses of the UU-test can be restated as H0:P(X1>X2)=P(X1<X2) H_0:\mathbb{P}(X_1\gt X_2)=\mathbb{P}(X_1\lt X_2) Ha:P(X1>X2)P(X1<X2) H_a:\mathbb{P}(X_1\gt X_2)\neq\mathbb{P}(X_1\lt X_2) Since the median is just defined as m=xm=x such that P(X<x)=0.5\mathbb{P}(X\lt x)=0.5

As stated before, there is a way to calculate a cdf for UU and test it against a significance level α\alpha, but it is beyond this course’s scope and better left to software.

Paired Signed-ranked Test

Cheat Sheet

  • Description: Tests if the sample median of a sample at one point in time (x\vec{x}) is different/more than/less than the median of a sample at a different point in time (x\vec{x}')

  • Statistic: mmm-m' (difference of medians)

  • Sidedness: Either

  • Null Hypothesis: H0:m=mH_0: m=m' (one-sided), mm,mmm\leq m',m\geq m' (two-sided)

  • Alternative Hypothesis: Ha:mmH_a: m\neq m' (one-sided), m>m,m<mm\gt m',m\lt m' (two-sided)

  • Test Statistic: W=i=1nsgn(Δxi)Rxx(Δxi) W=\sum_{i=1}^n\text{sgn}(\Delta x_i)R_{\vec{x}-\vec{x}'}(\Delta x_i)

    Description

    The function RR in the WW-statistic is called the rank function. It returns the index of xxx\in\vec{x} when x\vec{x} is sorted in ascending order. For example, if x=[5,3,8]\vec{x}=[5,3,8], then sorted that would be [3,5,8][3,5,8], so Rx(5)=2R_{\vec{x}}(5)=2.

The sgn\text{sgn} function (read “sign”) in WW takes the “sign” of its input. It is defined as:

sgn(x)={1 if x>01 if x<00 if x=0 \text{sgn}(x)=\begin{cases}1\ \text{if}\ x\gt 0\\\\ -1\ \text{if} \ x\lt 0\\\\ 0\ \text{if} \ x=0 \end{cases} So WW contains information about the relative ranks of the difference between the observations at the time of x\vec{x} and x\vec{x}'.

Good job, keep it up!

35%

Completed

You have 166 sections remaining on this learning path.