sample
AFRAJMOVICH-ARNOL'D - Dynamical systems V - Bifurcation theory and
catastrophe ..... ANTON-RORRES - Elementary linear algebra with applications -
Exercises - 9 ... Algorithms and engineering applications (book)((Springer, 2007)(
675s) ...... BARROSO - Functional analysis, holomorphy and approximation
theory ...
Part of the document
A goodness-of-fit test for parametric models based on dependently truncated
data Takeshi Emura
E-mail: emura@stat.ncu.edu.tw
? Graduate Institute of Statistics, National Central University,
Jhongda Road, Taoyuan, Taiwan
and
Yoshihiko Konno[1]
E-mail: konno@fc.jwu.ac.jp
Department of Mathematical and Physical Sciences, Japan Women's University,
2-8-1 Mejirodai, Bunkyo-ku, Tokyo 112-8681 Japan ABSTRACT
Suppose that one can observe bivariate random variables [pic] only when
[pic] holds. Such data are called left-truncated data and found in many
fields, such as experimental education and epidemiology. Recently, a method
of fitting a parametric model on [pic] has been considered, which can
easily incorporate the dependent structure between the two variables. A
primary concern for the parametric analysis is the goodness-of-fit for the
imposed parametric forms. Due to the complexity of dependent truncation
models, the traditional goodness-of-fit procedures, such as Kolmogorov-
Smirnov type tests based on the Bootstrap approximation to null
distribution, may not be computationally feasible. In this article, we
develop a computationally attractive and reliable algorithm for the
goodness-of-fit test based on the asymptotic linear expression. By applying
the multiplier central limit theorem to the asymptotic linear expression,
we obtain an asymptotically valid goodness-of-fit test. Monte Carlo
simulations show that the proposed test has correct type I error rates and
desirable empirical power. It is also shown that the method significantly
reduces the computational time compared with the commonly used parametric
Bootstrap method. Analysis on law school data is provided for illustration.
R codes for implementing the proposed procedure are available in the
supplementary material. Key words Central limit theorem . Empirical process . Truncation .
Maximum likelihood . Parametric Bootstrap . Shrinkage estimator 1. Introduction
Truncated data are those from which part of them are entirely excluded. For
instance, in the study of aptitude test scores in experimental education,
only those individuals whose test scores are above (or below) a threshold
may appear in the sample (Schiel, 1998; Schiel and Harmston, 2000). Many
different types of truncation are possible depending on how to determine
the truncation criteria. A classical parametric method for analyzing
truncated data is based on a fixed truncation. That is, a variable [pic] of
interest can be included in the sample if it exceeds a fixed value [pic],
where [pic] is known. Parametric estimation for the normal distribution of
[pic] has been given by Cohen (1991). Other examples of the fixed
truncation include the zero-truncated Poisson model in which [pic] is a
Poisson random variable and [pic].
A more general truncation scheme is the so-called "left-truncation" in
which the sample is observed when a variable [pic] exceeds another random
variable[pic]. The left-truncated data is commonly seen in studies of
biomedicine, epidemiology and astronomy (Klein and Moeschberger, 2003).
Construction of nonparametric estimators for [pic] under the left-
truncation has been extensively studied (e.g., Woodroofe, 1985; Wang, et
al., 1986). It is well known that the nonparametric methods rely on the
independence assumption between [pic] and [pic]. Accordingly, Tsai (1990),
Martin and Betensky (2005), Chen, et al. (1996), and Emura and Wang (2010)
have presented methods for testing the independence assumption. For
positive random variables [pic] and [pic], semiparametric approaches
proposed by Lakhal-Chaieb, et al. (2006) and Emura, et al. (2011) are
alternatives in the absence of independence assumption, where the
association structure between [pic] and [pic] is modeled via an Archimedean
copula.
Compared with the nonparametric and semiparametric inferences, there is
not much in the literature on the analysis of left-truncated data based on
parametric modeling. Although parametric modeling easily incorporates the
dependence structure between [pic] and [pic], it involves strong
distributional assumptions, and the inference procedure may not be robust
to departures from these assumptions (Emura and Konno, 2010). Nevertheless,
parametric modeling is still useful in many applications where parameters
in the model provide useful interpretation or a particular parametric form
is supported by the subject matter knowledge. For instance, in the study of
aptitude test scores in educational research, researchers may be interested
in estimating the mean and standard deviation of the test score [pic]
rather than [pic] (Schiel and Harmston, 2000; Emura and Konno, 2009).
Hence, parameters of the normal distribution usually provide useful summary
information (see Section 5 for details). For another example, the study of
count data in epidemiological research often encounters the zero-modified
Poisson model (Dietz and Böhning, 2000) for [pic] (see Example 3 in
Appendix A for details). For count data, the main focus is to estimate the
intensity parameter of the Poisson distribution rather than [pic]. In the
preceding two examples, one needs to specify the parametric forms of [pic].
If the goodness-of-fit tests are used appropriately, the robustness concern
about the parametric analysis can be circumvented.
In this article, we develop a computationally attractive and reliable
algorithm for the goodness-of-fit test by utilizing the multiplier central
limit theorem. The basic idea behind the proposed approach follows the
goodness-of-fit procedure for copula models (Kojadinovic and Yan, 2011;
Kojadinovic, et al., 2011), though the technical details and the
computational advantages in the present setting are different. The rest of
the paper is organized as follows: Section 2 briefly reviews the parametric
formulation given in Emura and Konno (2010). Section 3 presents the theory
and algorithm of the proposed goodness-of-fit test based on the multiplier
central limit theorem. Simulations and data analysis are presented in
Sections 4 and 5, respectively. Section 6 concludes this article. 2. Parametric inference for dependently truncated data
In this section, we introduce the parametric approach to dependent
truncation data based on Emura and Konno (2010) and derive the asymptotic
results of the maximum likelihood estimator (MLE) that are the basis for
the subsequent developments.
Let [pic] be a density or probability function of a bivariate random
variable [pic], where [pic] is a [pic]-variate vector of parameters and
where [pic] is a parameter space. In a truncated sample, a pair [pic] is
observed when [pic] holds. For observed data [pic] subject to [pic], the
likelihood function has the form
[pic], (1)
where [pic]. Let [pic] be the column vector of partial derivatives (with
respect to the component of [pic]) of [pic], i.e., [pic] for [pic], and let
[pic] be the maximum likelihood estimator (MLE) that maximizes (1) in
[pic]. Emura and Konno (2010) noted that for computing the MLE, it is
crucial that the simple formula of [pic] is available. This also has a
crucial role in the subsequent developments for the proposed goodness-of-
fit test procedure. For easy reference, Appendix A lists three examples of
the parametric forms that permit a tractable form in [pic]. The following
Lemma is a basis for deriving the asymptotic expression for the goodness-of-
fit statistics. Lemma 1: Suppose that (R1) through (R7) listed in Appendix B hold. Then,
[pic]. (2)
where [pic] is the Fisher information matrix and [pic] is the transposed
vector of [pic]. 3. Goodness-of-fit procedure under truncation 3.1 Asymptotic linear expression of the goodness-of-fit process
Let [pic] be a given parametric family. Also, let [pic] be the underlying
(true) density or probability function of a bivariate random variable
[pic]. Given the observed data [pic], we wish to test the null hypothesis
[pic] against [pic].
One of the popular classes of goodness-of-fit tests consists of comparing
the distance between [pic] and [pic], where [pic] is the indicator function
and
[pic]
The Kolmogorov-Smirnov type test is based on
[pic]. (3)
The calculation of [pic] requires the numerical integrations (or
summations) for [pic] at [pic] different points in [pic]. A computationally
attractive alternative is the Cramér-von Mises type statistic
[pic]. (4)
This requires exactly [pic] evaluations of [pic]. The null distributions
for [pic] and [pic] have not been derived and depend on the true value of
[pic].
Empirical process techniques are useful for analyzing the goodness-of-
fit process [pic] defined on [pic]. Let [pic] be [pic], where [pic] is a
collection of bounded functions on [pic] that are continuous from above,
equipped with the uniform norm [pic]. Under the assumptions (R1), (R3),
(R4) and (R8) listed in Appendix B, the map [pic] is shown to be the
Hadamard differentiable with the derivative