Nonparametric Methods - UC Davis Plant Sciences

En statistique mathématique, l'ouvrage présente l'estimation paramétrique, avec
... problèmes ou exercices, à difficulté croissante, avec leurs corrigés détaillés.
...... comme les tests non paramétriques, les séries chronologiques, les arbres de
 ...

Part of the document


Topic 14: Nonparametric Methods (ST & D Chapter 24) Introduction All of the statistical tests discussed up until now have been based on
the assumption that the data are normally distributed. Implicitly we are
estimating the parameters of this distribution, the mean and variance.
These are sufficient statistics for this distribution, that is, specifying
the mean and variance of a normal distribution specifies it completely.
The central limit theorem provides a justification for the normality
assumption in many cases, and in still other cases the robustness of the
tests with respect to normality provides a justification. Parametric
statistics deal with the estimation of parameters (e.g., means, variances)
and testing hypotheses for continuous normally distributed variables. In cases where the assumption of normality cannot be employed,
however, nonparametric, or distribution-free methods may be appropriate.
These methods lack the underlying theory of the parametric methods and we
will simply discuss them as a collection of tests. Nonparametric statistics
do not relate to specific parameters (the broad definition). They maintain
their distributional properties irrespective of the underlying distribution
of the data and for this reason they are called distribution-free methods.
Nonparametric statistics compare distributions rather than parameters.
Therefore, nonparametric statistics are less restrictive in terms of the
assumptions compared to parametric techniques. Although some assumptions,
for example, samples are random and independent, are still required. In
cases involving ranked data, i.e. data that can be put in the order, and/or
categorical data nonparametric statistics are necessary. Nonparametric
statistics are not generally as powerful (sensitive) as parametric
statistics if the assumptions regarding the distribution are valid for the
parametric test. That is, type II errors (false null hypothesis is
accepted) are more likely. 14.1. Advantages of using nonparametric techniques are the following. 1. They are appropriate when only weak assumptions can be made about the
distribution. 2. They can be used with categorical data when no adequate scale of
measurement is available. 3. For data that can be ranked, nonparametric test using ranked data may be
the best option. 4. They are relatively quick and easy to apply and to learn since they
involve counts, ranks and signs. 14.2 The (2 test of goodness of fit (ST&D Chapter 20, 21) The goodness of fit test involves a comparison of the observed
frequency of occurrence of classes with that predicted by a theoretical
model. Suppose there are n classes with observed frequencies O1, O2, ...,
On, and corresponding expected frequencies E1, E2, ..., En. The expected
frequency is the average number or expected value when the hypothesis is
true and is simply calculated as n multiplied by the hypothesized
population proportion. The statistics
[pic]
has a distribution that is distributed approximately as (2 with n -1
degrees of freedom. This approximation becomes better as n increases. If
parameters from the data are used to calculate the expected distributions
the degrees of freedom of the (2 will be n -1-p; where p is the number of
parameteres estimated. For example, if we want to test that a distribution
is normal and we estimate the mean and the variance from the data to
calculate the expected frequencies, the df will be n-1-2 (ST&D482). If the
hypothesis is extrinsic to the data, like in a genetic proportion, then p=0
and df=n-1. There are some restrictions to the utilization of (2 tests. The
approximation is good for n>50. There should be no 0 expected frequencies
and expected frequencies