Meta-analysis in R: Part 2 – Binary data
In part 1, I discussed how to get set up for meta-analysis in R, by installing the necessary software and libraries.
In this post, I will take the reader through the steps for performing meta-analysis of binary data in R.
Required data format
Binary outcomes are those in which study participants either experience an outcome, or do not. Examples are acute rejection and delayed graft function. The data required for meta-analysis is as follows:
[table th=0]
n.s, The total number of participants in the study group
n.c, The total number of participants in the control group
event.s, The number of participants experiencing the outcome in the study group
event.c, The number of participants experiencing the outcome in the control group
[/table]
In order to label the studies, it is recommended that a study name and year of publication are used.
An example spreadsheet looking at the incidence of hypercholesterolaemia in patients withdrawn from or avoiding steroids is shown below:
[table]
name,year,event.s,n.s,event.c,n.c
Vincenti,2008,101,227,57,109
Laftavi,2005,4,14,11,16
Ponticelli,1997,86,114,102,115
Montagnino,2005,43,65,48,68
Vitko,2005,18,143,25,134
Woodle,2008,97,156,112,152
Burke,2001,6,20,13,20
Park,1994,8,22,35,46
Cristinelli,1986,7,35,13,31
Vanrenterghem,2005,39,235,62,237
Smak Gregoor,2002,14,76,11,73
Sola,2002,5,46,6,46
Hollander,1997,9,42,23,41
[/table]
Loading the dataset
In R, a spreadsheet of data is termed a “dataset”. Datasets can either be entered manually, or imported from an excel spreadsheet. The latter can be performed from the command-line, or using the RStudio GUI.
Loading data using the GUI
This is the easiest method to load your dataset into R. From the “Environment” tab in the top right pane, click the “Import Dataset” button, and select “From Text File…”:
Select your .csv file that you created in excel, and click “Open”.
You will then be given the option to describe your data format. Use the settings shown below:
The data frame on the bottom left previews what you data will look like once imported. Click “Import”, and a new panel will open at the top left of the screen showing your data frame. The dataset will also appear in your environment in the top right pane:
Loading data from the command-line
If the above step does not work, or you are using a different GUI/platform, you can load your file from the command-line. First of all, R needs to know where to find your files. From the “Session” menu, select “Set Working Directory > Choose Directory…”. Navigate to the directory containing your data, and click “Submit”. Alternatively, you can type in the console:
setwd("C:/Documents and Settings/User/My Documents")
Replace the directory in the quotes with your working directory.
To see the files in the current directory, type:
dir()
To load a CSV file (as prepared above), type the following:
read.table("cholesterol.csv", sep = ",", header=TRUE)
This tells R the file name and extension, that the data is separated by commas, and that the first row of data is the column headings. It should output something like this:
In order to be able to use this data, it must be stored as a variable. Run the following command:
cholesterol.data <- read.table("cholesterol.csv", sep = ",", header=TRUE)
This stores the dataset above as a variable called “cholesterol.data”. The dataset can be viewed by typing it’s name:
cholesterol.data
The output should be the same as the table above. To access an individual column from the data, use the $ operator. For example:
cholesterol.data$name
Should output:
[1] Vincenti Laftavi Ponticelli Montagnino Vitko [6] Woodle Burke Park Cristinelli Vanrenterghem [11] Smak Gregoor Sola Hollander
Performing the meta-analysis
There are two steps to performing the meta-analysis. Firstly, an effect size must be calculated for each study. In the case of discrete data this can be a Relative Risk (RR), Odds Ratio (OR) or Risk Difference (RD). Then these effect sizes much be combined in a meta-analytical model to create a summary effect size, along with confidence intervals and measures of heterogeneity.
Fortunately, the metafor package contains functions that can perform these steps individually, or more conveniently combine these steps automatically giving all of the required data. The function for meta-analysis is the rma.uni() function. In the case of binary meta-analysis, it takes data in the form of a 2×2 table:
[table]
Group, Outcome 1, Outcome 2, Total
Study Group, ai, bi, n1i
Control Group, ci, di, n2i
[/table]
So looking at our cholesterol data from section 3.2, we have values for ai, ci, n1i and n2i. Try the following:
rma.uni(ai=event.s, n1i=n.s, ci=event.c, n2i= n.c, data=cholesterol.data, slab=paste(name, year), measure="RR")
In brackets, we are telling the function the names of the columns containing the data (event.s, n.s, event.c, n.c) and the name of the dataset (data=cholesterol.data). The argument (slab=paste(name, year)) provides a label for each study by combining the name and year columns, separated by a space. The final argument tells the function that we wish to use the relative risk. The command should produce the following output:
Random-Effects Model (k = 13; tau^2 estimator: REML) tau^2 (estimate of total amount of heterogeneity): 0.0225 (SE = 0.0225) tau (sqrt of the estimate of total heterogeneity): 0.1499 I^2 (% of total variability due to heterogeneity): 47.93% H^2 (total variability / within-study variance): 1.92 Test for Heterogeneity: Q(df = 12) = 20.6639, p-val = 0.0555 Model Results: estimate se zval pval ci.lb ci.ub -0.2854 0.0712 -4.0083 <.0001 -0.4249 -0.1458 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The default here is a random-effects model. Below this, there is an assessment of heterogeneity including the I^2 value and Chi-squared p-value. Finally, the results of the model are given, with the log relative risk, SE, p-value and borders for the 95% confidence interval. The log values can be converted to the true relative risk using the exp() function. For example:
exp(-0.2854) [1] 0.7517135
This tells us that the relative risk from the random-effects model is 0.75. The same can be done for the borders of the confidence interval (giving 0.65-0.86).
Other rma.uni options
By default, the rma.uni function uses the restricted maximum-likelihood estimator (REML) random-effects model. By adding more options to our command, we can modify these defaults. Some useful options are:
[table th=0]
method,"The meta-analysis model used. This can be 'REML', 'DL' (DeSimonian-Laird), or 'FE' (fixed effects). Other models are available."
measure, "The summary method used. This can be 'RR', 'OR' or 'RD' for binary data, 'MD' or 'SMD' for continuous data."
subset, "A subset of studies from the dataset to meta-analyse. This will be described in a later post."
mods, "Moderator variables for a mixed-effects analysis. This will be described in a later post."
[/table]
To give another example, try the following command:
rma.uni(ai=event.s, n1i=n.s, ci=event.c, n2i= n.c, data=cholesterol.data, slab=paste(name, year), measure="OR", method="FE")
This gives:
Fixed-Effects Model (k = 13) Test for Heterogeneity: Q(df = 12) = 19.6044, p-val = 0.0750 Model Results: estimate se zval pval ci.lb ci.ub -0.5898 0.1005 -5.8685 <.0001 -0.7868 -0.3928 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Now a fixed-effects model is used, with results reported as the log odds-ratio. Exp(-0.5898) tells us that the combined odds ratio from the fixed-effects model is 0.55.
In the next post, I will deal with the meta-analysis of continuous data.