Interview Query
Back to Data Science Interview
Data Science Interview

Data Science Interview

92 of 257 Completed

4m read
12m read
9m read
Question
7m read
9m read
Question
3m Read
Question
4m read
Question

Non-parametric Tests

So far, all of our tests have been parametric, meaning they assume that they make assumptions about the sample distribution. We assumed that the sample followed a normal distribution in the ZZ, tt, and FF tests. In the proportions and χ2\chi^2 tests, we assume that the samples follow a binomial or multinomial distribution, respectively.

But sometimes, particularly with small samples, it is not far to make these assumptions. For this reason, we have non-parametric tests that do not make any assumptions about the distribution of the sample.

While there are many, many non-parametric tests. We will go over two of the most popular ones: UU test and the paired signed-rank test.

Please note that there are ways to calculate pp-values for the test statistics of non-parametric tests, but we don’t describe how to do it here due to their esoteric nature.

UU Test

Cheat Sheet

  • Description: Tests if the median of two independent samples (say x1\vec{x}_1 and x2\vec{x}_2) are different/more than/less than the median of another sample
  • Statistic: m1m2m_1-m_2 (difference of medians)
  • Sidedness: Either
  • Null Hypothesis: H0:m1=m2H_0: m_1=m_2 (one-sided), m1m2,m1m2m_1\leq m_2,m_1\geq m_2 (two-sided)
  • Alternative Hypothesis: Ha:m1m2H_a: m_1\neq m_2 (one-sided), m1>m2,m1<m2m_1\gt m_2,m_1\lt m_2 (two-sided)
  • Test Statistic: U=min(U1,U2)U=\min(U_1,U_2) where U1=i=1n1j=1n2S(x1,i,x2,j),S(x,y)={1,if x>y0.5,if x=y0,if x<y U_1 = \sum_{i=1}^{n_1}\sum_{j=1}^{n_2}S(x_{1,i},x_{2,j}),\quad S(x,y)=\begin{cases}1,\quad\text{if} \ x\gt y\\\\ 0.5,\quad\text{if} \ x=y\\\\ 0,\quad\text{if} \ x\lt y\end{cases}

U2=i=1n1j=1n2S1(x1,i,x2,j),S1(x,y)={0,if x>y0.5,if x=y1,if x<y U_2 = \sum_{i=1}^{n_1}\sum_{j=1}^{n_2}S_{-1}(x_{1,i},x_{2,j}),\quad S _{-1}(x,y)=\begin{cases}0,\quad\text{if} \ x\gt y\\\\ 0.5,\quad\text{if} \ x=y\\\\ 1,\quad\text{if} \ x\lt y\end{cases}

Description

The idea behind this test is that U1U_1 and U2U_2 are proxies for P(X1>X2)\mathbb{P}(X_1\gt X_2). In fact, the hypotheses of the UU-test can be restated as H0:P(X1>X2)=P(X1<X2) H_0:\mathbb{P}(X_1\gt X_2)=\mathbb{P}(X_1\lt X_2) Ha:P(X1>X2)P(X1<X2) H_a:\mathbb{P}(X_1\gt X_2)\neq\mathbb{P}(X_1\lt X_2) Since the median is just defined as m=xm=x such that P(X<x)=0.5\mathbb{P}(X\lt x)=0.5

As stated before, there is a way to calculate a cdf for UU and test it against a significance level α\alpha, but it is beyond this course’s scope and better left to software.

Paired Signed-ranked Test

Cheat Sheet

  • Description: Tests if the sample median of a sample at one point in time (x\vec{x}) is different/more than/less than the median of a sample at a different point in time (x\vec{x}')

  • Statistic: mmm-m' (difference of medians)

  • Sidedness: Either

  • Null Hypothesis: H0:m=mH_0: m=m' (one-sided), mm,mmm\leq m',m\geq m' (two-sided)

  • Alternative Hypothesis: Ha:mmH_a: m\neq m' (one-sided), m>m,m<mm\gt m',m\lt m' (two-sided)

  • Test Statistic: W=i=1nsgn(Δxi)Rxx(Δxi) W=\sum_{i=1}^n\text{sgn}(\Delta x_i)R_{\vec{x}-\vec{x}'}(\Delta x_i)

    Description

    The function RR in the WW-statistic is called the rank function. It returns the index of xxx\in\vec{x} when x\vec{x} is sorted in ascending order. For example, if x=[5,3,8]\vec{x}=[5,3,8], then sorted that would be [3,5,8][3,5,8], so Rx(5)=2R_{\vec{x}}(5)=2.

The sgn\text{sgn} function (read “sign”) in WW takes the “sign” of its input. It is defined as:

sgn(x)={1 if x>01 if x<00 if x=0 \text{sgn}(x)=\begin{cases}1\ \text{if}\ x\gt 0\\\\ -1\ \text{if} \ x\lt 0\\\\ 0\ \text{if} \ x=0 \end{cases} So WW contains information about the relative ranks of the difference between the observations at the time of x\vec{x} and x\vec{x}'.

Good job, keep it up!

35%

Completed

You have 165 sections remaining on this learning path.

Loading pricing options