Multiple imputation mi rubin, 1978, 1987a2004, 1996. Therefore this handout will focus on multiple imputation. Multiple imputation mi, introduced by rubin 1978 and discussed in detail in rubin 1987, is an approach that retains the advantages of imputation while allowing the data analyst to make valid assessments of uncertainty. In this chapter, we will deal with some specific topics when you perform regression modeling in multiple imputed datasets. We describe how mi methods for fullcohort studies can be adapted to account for the sampling designs of nested casecontrol and casecohort studies. Multiple imputation rubin 1978, 1987 has come a long way. In the late 1970s, rubin 1978 proposed the theory of multiple imputation. Quantifying the impact of fixed effects modeling of. Reporting the use of multiple imputation for missing data. Multiple imputation allows the uncertainty due to imputation to be reflected in the analysis rubin, 1978, 1987. As early as the 1970s, rubin 1978 proposed the theory of multiple imputation. Forty years after donald rubin s seminal paper rubin, 1978 which introduced the concept of multiple imputation, the approach has been shown to be useful in many contexts going far beyond the classical item nonresponse in cross sectional surveys for which it was origi.
Multiple imputation using chained equations for missing data in timss. We illustrate rr with a ttest example in 3 generated multiple imputed datasets in spss. The development of diagnostic techniques for multiple imputation, though, has been retarded by the belief that the assumptions of the procedure are untestable from observed data. The general, very simplified, procedure as outlined by rubin, 1987 is a series of steps. The problem multiple imputation was designed to address missing values are a problem in many data sets and seem especially common in the medical and social sciences. It should be noted that this volume is not intended to be the exclusive source of the multiple imputation software. Such methods include multiple imputation rubin, 1978 and the expectation maximisation em algorithm dempster et al. Multiple imputation mi rubin, 1987 is a widely used method for handling missing data. Using multiple imputation to address missing values 748 with very small clusters for multiple imputation. A multivariate technique for multiply imputing missing. Multiple imputation is a statistical technique for handling incomplete data and for delivering an analysis that makes use of all possible information rubin, 1977 1978. The key step in rubin s 1978 multiple imputation is. The second major algorithm is called fully conditional.
In recent years, the parametric multiple imputation method proposed by rubin 1978, 1987 has become one of the most popular methods for handling missing data. His original goal was to impute mcompleted datasets for public usage in the context of public surveys in which a response rate of less than 60 percent for any variable was rare. Results in simulation 1, the screeningfirst imputation approach is consistent with the datagenerating process and is expected to perform well. Praise for the first edition of statistical analysis with missing data an important contribution to the applied statistics literature i give the book high marks for unifying and making accessible much of the past and current work in this important area. Missing data analysis with the mahalanobis distance. We consider three imputation approaches suitable for use. It was derived using the bayesian paradigm rubin 1987 1996. Multiple imputation was designed to handle the problem of missing data in public use data. Rubin 1976 was the rst to introduce the concept of the mechanism of missing. Rubin 1978 suggested to take several independent realizations of imputation mechanism, and provided the ways to combine the estimates to obtain the point estimates and standard errors valid under proper imputation assumptions. Most popular statistical software packages have options for multiple imputation, which require little. Applications of multiple imputation in medical studies. Under the multiple imputation paradigm of rubin 1978, the imputer. Inference from multiple imputation for missing data using mixtures.
This is also true of the multiple imputation methods available in the recently released missing data module for spss or. According to rubin 1978, the multiple imputation estimator denoted. This provides for an interesting alternative when there is a concern that single imputation. Despite its theoretical beauty, multiple imputation was computationally challenging, and.
The technique of multiple imputation, which originated in early 1970 in application to survey nonresponse rubin1976, has. The development of diagnostic techniques for multiple imputation, though, has been retarded. I examine two approaches to multiple imputation that have been incorporated into widely available software. Y,x, a probability model that attempts to model the missing data based on the observed responses y and other information for both complete and incomplete cases x. Multiple imputation using chained equations for missing. Statistical analysis with missing data wiley series in.
The objective is valid frequency inference for ultimate users who in general have access only to completedata software and possess limited knowledge of specific reasons and models for nonresponse. The following is the procedure for conducting the multiple imputation for missing data that was created by rubin. Statistical analysis multiply imputed data was used. Multiple imputation has become popular in the 30 years since its formal introduction rubin, 1978, and a variety of imputation methods and software are now available e. Under the multiple imputation paradigm of rubin 1978, the imputer generates copies of the missing data from pz. It can be used for multiple imputation of missing data of several variables with no particular structure. Because the mi procedure does not adequately perform imputation for the data, this method is not described in detail. Multiple imputation mi is a way to deal with nonresponse bias missing research data that. Abstract multiple imputation was designed to handle the problem of missing data in publicuse data bases where the database constructor and the ultimate user are distinct entities. The multiple imputation framework suggested by rubin 1978, 1987a, 1996 is an attractive option if a data set is to be used by multiple researchers with differing levels of statistical expertise. After that, i performed a repeated measures test in spss. Rubin, educational testing service a general attack on the problem of non response in sample surveys is outlined from the.
The focus of this thesis will be on multiple imputation but both methods, among others, will be outlined. In order to deal with the problem of increased noise due to imputation, rubin 1987 developed a method for averaging the outcomes across multiple imputed data sets to account for this. Multiple imputation by ordered monotone blocks with application to the anthrax vaccine research program fan li, michela baccini, fabrizia mealli, elizabeth r zell, constantine e frangakis, donald b rubin 1 abstract. Multiple imputation for missing data statistics solutions. Rubin multiple imputation was designed to handle the problem of missing data in publicuse data bases where the database constructor and the ultimate user are distinct entities. For nearly two decades i have been advocating and developing the use of multiple imputation to address aspects of this problem. Inference from multiple imputation for missing data using. Schafer 1997, van buuren and oudshoom 2000 and raghunathan et al. New computational algorithms and software described in a recent book schafer, 1997 allow us to create proper multiple imputations in complex multivariate settings. Imputation similar to single imputation, missing values are imputed.
Introduction the general statistical theory and framework for managing missing information has been well developed since rubin 1987 published his pioneering treatment of multiple imputation methods for nonresponse in surveys. The idea of multiple imputation for missing data was first proposed by rubin 1977. This method does not require any direct assumption on joint distribution of the variables and it is presently implemented in standard statistical software splus, stata. For this situation and objective, i believe that multiple imputation by the database constructor is the method of choice. Multiple imputation mi has become a standard statistical technique for dealingwithmissingvalues.
Let us first introduce the basics of rubins multiple imputation rubin, 1987. Creation of the multiply imputed values is a key step in a multiple imputation analysis. Descriptive statistics em algorithm for data with missing values statistical assumptions for multiple imputation missing data patterns imputation methods monotone methods for data sets. His original goal was to impute m completed datasets for public usage in the context of public surveys in which a response rate of less than 60 percent for any variable was. Therefore, multiple imputation by the emb algorithm can be considered to be proper imputation in rubin s sense 1987. Using multiple imputation to address missing values of. An object of class mipo, which stands for multiple imputation pooled outcome. We assume that the imputer uses a parametric regression model, though we expect that these results would extend to imputation via a nonparametric method such as hot deck with the approximate bayesian bootstrap rubin and schenker, 1986. With multiple imputation, m 1 plausible sets of replacements are generated for the missing values, thereby generating. In a 2000 sociological methods and research paper entitled multiple imputation for missing data. Multiple imputation provides a useful strategy for dealing with data sets with missing values. Parameter estimates after multiple imputation were derived based on rubin s combination rule rubin 1978, 1987, 1996. The missingdata mechanism has three classifications rubin, 1976.
Existing algorithms and software for multiple imputation there are three major algorithms for multiple imputation. The pool function averages the estimates of the complete data model, computes the total variance over the repeated analyses by rubin s rules rubin, 1987, p. All multiple imputation methods follow three steps. Using multiple imputation, we create two or more completed datasets, do the usual analysis on each completed dataset, then draw inferences based on both the within and between imputation variability. Multiple imputation by ordered monotone blocks with. Rubin s rule to combine multiply imputed results according to rubin 1987, for statistical results estimated using multiply imputed data, the nal point estimate is the average of the mcomplete. Reporting the use of multiple imputation for missing data in higher education research. Rubin db 1978 multiple imputation in sample surveys a phenomenological bayesian approach to nonresponse. A cautionary tale allison summarizes the basic rationale for multiple imputation. A free r software package called mimix that implements our methods is. Multiple imputation for missing data is an attractive method for handling missing data in multivariate analysis. I know that i can use rubin s rules implemented through any multiple imputation package in r to pool means and standard. The full text of this article is available in pdf format. A commercial software program using the mcmc algorithm is sas proc mi sas, 2011.
Multiple imputation using chained equations for missing data in. Combining analysis results from multiply imputed categorical data, continued 2 fortunately, multiple imputation can be used not only for continuous variables, but also for binary and categorical ones. The first traditional algorithm is based on markov chain monte carlo mcmc. This is the original version of rubin s 1978, 1987 multiple imputation. Multiple imputation for missingness due to nonlinkage and. His original goal was to impute mcompleted datasets for public usage in the context of public surveys in which a response rate of less than 60 percent for any variable was. This chapter is a followup on the previous chapter 5 about data analysis with multiple imputation. The software described in this manual is furnished under a license agreement or nondisclosure agreement.
Regardless of the nature of the post imputation phase, mi inference treats missing data as an explicit source of random variability and. Chapter 6 more topics on multiple imputation and regression modelling. The diversity of the contributions to this special volume provides an impression about the progress of the last decade in the software development in the multiple imputation. Rpackage norm currently implements this version of multiple imputation schafer, 1997.
1201 1374 491 1173 1092 797 809 167 422 173 1131 628 1300 227 279 693 654 1002 1345 367 39 59 658 159 375 1431 476 1427 1461 693 406 57 1039 1412 1044 740 650 982 1331 1419 486 639 979