Abstract
Non-biological experimental variation or “batch effects" are commonly observed across multiple batches of microarray experiments, often rendering the task of combining data from these batches difficult. The ability to combine microarray data sets is advantageous to researchers to increase statistical power to detect biological phenomena from studies where logistical considerations restrict sample size or in studies that require the sequential hybridization of arrays. In general, it is inappropriate to combine data sets without adjusting for batch effects. Methods have been proposed to filter batch effects from data, but these are often complicated and require large batch sizes (
1 INTRODUCTION
With the many applications of gene expression microarrays, biologists are able to efficiently extract hypotheses that can later be tested experimentally in a lab setting. For example, a microarray experiment might compare the gene expression profile of diseased or treated tissue (treatment) with the profile of normal tissue (controls) to determine which genes are associated with the disease or the presence of the treatment, providing better understanding of disease/gene relationships. However, practical considerations limit the number of samples that can be amplified and hybridized at one time, and replicate samples may be generated several days or months apart, introducing systematic “batch effects” or non-biological differences that make samples in different batches not directly comparable. Batch effects have been observed from the earliest microarray experiments (Lander, 1999), and can be caused by many factors including the batch of amplification reagent used, the time of day when an assay is done, or even the atmospheric ozone level (Fare and others, 2003). Batch effects are also inevitable when new samples or replicates are incrementally added to an existing array data set or in a meta-analysis of multiple studies that pools microarray data across different labs, array types, or platforms (Rhodes and others, 2004). Some researchers have presented methods for adjusting for batch effects (Benito and others, 2004; Alter and others, 2000), but these methods require many samples (
1.1 Microarray data with batch effects
Data set 1 resulted from an oligonucleotide microarray (Affymetrix HG-U133A) experiment on human lung fibroblast cells (IMR90) designed to reveal whether exposing mammalian cells to nitric oxide (NO) stabilizes mRNAs. Control samples and samples exposed to NO for 1 h were then transcription inhibited for 7.5 h. Microarray data were collected at baseline (0 h, just before transcription inhibition) and at the end of the experiment (after 7.5 h) for both the control and the NO-treated group. It was hypothesized that NO will induce or inhibit the expression of some genes, but would also stabilize the mRNA of many genes, preventing them from being degraded after 7.5 h. One sample per treatment combination was hybridized, resulting in four arrays. This experiment was repeated at three different times or in three batches (totaling 12 samples). The batches in this data set were identical experiments using the same cell source, and were conducted by the same researcher in the same lab using the same equipment. Figure 1(a) contains a heat map of data set 1 using a standard hierarchical clustering algorithm produced using the dChip software (Li and Wong, 2003). This heat map exhibits characteristics commonly seen by researchers attempting to combine multiple batches of microarray data. All four samples in the second batch cluster together, indicating that the clustering algorithm recognized the batch-to-batch variation as the most significant source of variation within this data set. We give another example data set with batch effects, denoted throughout this paper as data set 2, in the online supplementary materials available at Biostatistics online.
Heat map clusterings for data set 1. The gene-wise expression values are used to compute gene and sample correlations and displayed in color scale, and the sample legends on the top are 0 (0 h), 7 (7.5 h), C (Control), and N (NO treated). (a) Expression for 628 genes with large variation across all the 12 samples. Note that the samples from the batch 2 cluster together and the baseline (
1.2 EB applications in microarrays
EB methods have been applied to a large variety of settings in microarray data analysis (Chen and others, 1997; Efron and others, 2001; Newton and others, 2001; Tusher and others, 2001; Kendziorski and others, 2003; Smyth, 2004; Lönnstedt and others, 2005; Pan, 2005; Gottardo and others, 2006). EB methods are very appealing in microarray problems because of their ability to robustly handle high-dimensional data when sample sizes are small. EB methods are primarily designed to “borrow information” across genes and experimental conditions in hope that the borrowed information will lead to better estimates or more stable inferences. In the papers mentioned above, EB methods were usually designed to stabilize the expression ratios for genes with very high or very low ratios, stabilize gene variances by shrinking variances across all other genes, possibly protecting their inference from artifacts in the data. In this paper, we extend the EB methods to the problem of adjusting for batch effects in microarray data.
2 EXISTING METHODS FOR ADJUSTING BATCH EFFECT
2.1 Microarray data normalization
Microarray data are often subject to high variability due to noise and artifacts, often attributed to differences in chips, samples, labels, etc. In order to correct these biases caused by non-biological conditions, researchers have developed “normalization” methods to adjust data for these effects (Schadt and others, 2001; Tseng and others, 2001; Yang and others, 2002; Irizarry and others, 2003). However, normalization procedures do not adjust the data for batch effects, so when combining batches of data (particularly batches that contain large batch-to-batch variation), normalization is not sufficient for adjusting for batch effects and other procedures must be applied.
2.2 Other batch effect adjustment methods
A few methods for adjusting data for batch effects have been presented in the literature. Alter and others (2000) propose a method for adjusting data for batch effects based on a singular-value decomposition (SVD) by adjusting “the data by filtering out those eigengenes (and eigenarrays) that are inferred to represent noise or experimental artifacts.” Nielsen and others (2002) successfully apply this SVD batch effect adjustment to a microarray meta-analysis. Benito and others (2004) use distance weighted discrimination (DWD) to correct for systematic biases across microarray batches by finding a separating hyperplane between the two batches, and adjusting the data by projecting the different batches on the DWD plane, finding the batch mean, and then subtracting out the DWD plane multiplied by this mean.
There are difficulties faced by researchers who try to implement the SVD and DWD batch adjustment methods. These methods are fairly complicated and usually require many samples (
2.3 Model-based location/scale adjustments
Location and scale (L/S) adjustments can be defined as a wide family of adjustments in which one assumes a model for the location (mean) and/or scale (variance) of the data within batches and then adjusts the batches to meet assumed model specifications. Therefore, L/S batch adjustments assume that the batch effects can be modeled out by standardizing means and variances across batches. These adjustments can range from simple gene-wise mean and variance standardization to complex linear or non-linear adjustments across the genes.
3 EB METHOD FOR ADJUSTING BATCH EFFECT
The most important disadvantage of the SVD, DWD, and L/S methods is that large batch sizes are required for implementation because such methods are not robust to outliers in small sample sizes. In this section, we propose a method that robustly adjusts batches with small sample sizes. This method incorporates systematic batch biases common across genes in making adjustments, assuming that phenomena resulting in batch effects often affect many genes in similar ways (i.e. increased expression, higher variability, etc). Specifically, we estimate the L/S model parameters that represent the batch effects by “pooling information” across genes in each batch to “shrink” the batch effect parameter estimates toward the overall mean of the batch effect estimates (across genes). These EB estimates are then used to adjust the data for batch effects, providing more robust adjustments for the batch effect on each gene. The method is described in three steps below.
3.1 Parametric shrinkage adjustment
Step 1: Standardize the data
Checking the prior distribution assumptions of the L/S model batch parameters. (a) The gene-wise estimates of additive batch parameter (
Step 3: Adjust the data for batch effects
4 RESULTS AND ROBUSTNESS OF THE EB METHOD
4.1 Results for data set 1
The parametric EB adjustment was applied to data set 1. Comparing Figure 1(a) to (c) provides evidence that the batch effects were adequately adjusted for in these data. Downstream analyses are now appropriate for the combined data without having to worry about batch effects. Figure 3 illustrates the amount of batch parameter shrinkage that occurred for the adjustments for 200 genes from one of the batches from data set 1. The adjustments viewed in this figure are on a standardized scale, so the magnitude of the actual adjustments also depends on the gene-specific expression characteristics (overall mean and pooled variance from standardization) and may vary significantly from gene to gene.
Shrinkage plot for the first 200 probes from one of the batches in data set 1. The gene-wise and EB estimates of
4.2 Robustness of the EB method
The robustness of the EB method results from the shrinkage of the L/S batch estimates by pooling information across genes. If the observed expression values for a given gene are highly variable across samples in one batch, the EB batch adjustment is more like prior and less like the gene's gene-wise estimates, and becomes more robust to outlying observations. This phenomenon is illustrated in Figure 4.
Plots (a)–(c) illustrate the robustness of the EB adjustments compared to the L/S adjustments. The symbols signify batch membership. Data in (a) are unadjusted expression values for two genes from data set 1; (b) are the L/S-adjusted data. Note that the locations and scales of the batches in (b) have been over adjusted because of outliers in the unadjusted data, and that these outliers disappear in the L/S data; (c) are the EB-adjusted data. The outliers remain in these data and the batch with outliers is barely adjusted. The batches without outliers are adjusted correctly in the EB data.
4.3 Software implementation
The EB batch effect adjustment method described here is implemented in the R software package (http://www.r-project.org) and is freely available for download at http://biosun1.harvard.edu/complab/batch/. Detailed information on computing times required to adjust the example data sets are given in the supplementary materials available at Biostatistics online.
5 DISCUSSION
Batch effects are a very common problem faced by researchers in the area of microarray studies, particularly when combining multiple batches of data from different experiments or if an experiment cannot be conducted all at once. We have reviewed and discussed the advantages and disadvantages of the existing batch effect adjustments. Notably, none of these methods are appropriate when batch sizes are small (
We have shown that the EB adjustments allow for the combination of multiple data sets and are robust to small sample sizes. We illustrated the usefulness of our EB adjustments by combining two example data sets containing multiple batches with small batch size, and obtained consistent results from downstream analyses (clusterings and analysis as compared to similar single-batch data) while robustly dealing with outliers in these data. Additional discussion topics are included in the supplementary materials available at Biostatistics online.
We thank Wing Hung Wong, Donna Neuberg, and Yu Guo for helpful discussions, and Jennifer O'Neil for providing data set 2. This work is supported by grants from the National Institutes of Health (2R01 HG02341 and T32 CA009337) and the Claudia Adams Barr Program in Innovative Basic Cancer Research. Ariel Rabinovic was supported by grants RO1-CA82737 (Bruce Demple) and T32-CA09078. Conflict of Interest: None declared.