Compilation of methodology for the application of the Kolmogorov-Smirnov test and calculation of the p-value without the use of tables

Código

I-EBHE0173

Autores

Maria Luíza Teófilo Gandini, PAULO IVO BRAGA DE QUEIROZ

Tema

WG 1.02: Decomposing Complexity

Resumo

The Dmax statistic of the Kolmogorov-Smirnov test (KOLMOGOROV, 1933; SMIRNOV, 1933) is the maximum value of the absolute difference between two cumulative distribution functions. The equations 1 to 8 are from Press et al. (2007). To compare a dataset SN(x) to a cumulative distribution function P(x), Dmax is calculated by Equation 1. 1 It is important to remember that the interval presented in the equation above must be the domain of the probability distribution function. In the case of log-normal distribution or gamma distribution, the interval should be given by 0 < x < ?; in log-gamma, the interval would be 1 < x < ?. Use Equation 2 to compare two different experimental cumulative distribution functions, SN1(x) and SN2(x), 2 N1 and N2 must be such that the x-values in the two experimental distributions are neighbors. The KS distribution, set to positive z values, is used in the adherence test. Practically, the test does not use its probability density function, but usually, the Cumulative Probability Function (CPF) is calculated, PKS(z), defined by the series as in Equation 3 or Equation 4. These two series are equivalent, but each provides a distinct interval for faster convergence. Three terms of the series are sufficient to give an error in the probability calculation of less than 10-13 for the ranges recommended below: 3 if z ? 1, or 4 if z > 1, and z is defined by Equation 5: 5 When comparison involving a continuous distribution, the effective number of points, Ne, is equal to the number of experimental points, 6 in the case of comparison between two experimental data sequences, the effective number of points is given by 7 N1 is the number of data points for the first distribution, and N2 is the number for the second. For Press et al. (2007), in terms of the QKS function, the complement of PKS, the p-value of an observed value of D (evidence to refute the null hypothesis, H0, that the distributions are the same) is given approximately by Equation 8 replacing Dobserved to Dmax: 8 To calculate the Dcritical, one can use the asymptotic formula derived by Smirnov, Equation 9, and substitute it in Equation 10 (MILLER, 1956): 9 10 where 11 ? is the significance level, and N is the sample size. From N = 5, Equation 10 closely approaches the critical values initially presented for the Kolmogorov-Smirnov test. Important to note that in case of KS test be bilateral, ? in Equations 9 and 10 should be replaced by ?/2. The interpretations of the test are: if Dmax > Dcritical rejects H0, as well as if p-value < ? and if Dmax ? Dcritical does not reject H0, as well as if p-value ? ?. Researchers typically use a 5% significance level for the unilateral test and a 10% significance level for the bilateral test.

9th International Symposium on Integrated Water Resources Management (IWRM) | 14th International Workshop on Statistical Hydrology (STAHY) | I EBHE - Encontro Brasileiro de Hidrologia Estatística