Meta-analysis in R: Part 2 – Binary data

In part 1, I discussed how to get set up for meta-analysis in R, by installing the necessary software and libraries.

In this post, I will take the reader through the steps for performing meta-analysis of binary data in R.

Required data format

Binary outcomes are those in which study participants either experience an outcome, or do not. Examples are acute rejection and delayed graft function. The data required for meta-analysis is as follows:

[table th=0]
 n.s, The total number of participants in the study group
 n.c, The total number of participants in the control group
 event.s, The number of participants experiencing the outcome in the study group
 event.c, The number of participants experiencing the outcome in the control group
[/table]

In order to label the studies, it is recommended that a study name and year of publication are used.

An example spreadsheet looking at the incidence of hypercholesterolaemia in patients withdrawn from or avoiding steroids is shown below:

[table]
name,year,event.s,n.s,event.c,n.c
Vincenti,2008,101,227,57,109
Laftavi,2005,4,14,11,16
Ponticelli,1997,86,114,102,115
Montagnino,2005,43,65,48,68
Vitko,2005,18,143,25,134
Woodle,2008,97,156,112,152
Burke,2001,6,20,13,20
Park,1994,8,22,35,46
Cristinelli,1986,7,35,13,31
Vanrenterghem,2005,39,235,62,237
Smak Gregoor,2002,14,76,11,73
Sola,2002,5,46,6,46
Hollander,1997,9,42,23,41
[/table]

Loading the dataset

In R, a spreadsheet of data is termed a “dataset”. Datasets can either be entered manually, or imported from an excel spreadsheet. The latter can be performed from the command-line, or using the RStudio GUI.

Loading data using the GUI

This is the easiest method to load your dataset into R. From the “Environment” tab in the top right pane, click the “Import Dataset” button, and select “From Text File…”:

Load data

Select your .csv file that you created in excel, and click “Open”.

You will then be given the option to describe your data format. Use the settings shown below:

Import data

The data frame on the bottom left previews what you data will look like once imported. Click “Import”, and a new panel will open at the top left of the screen showing your data frame. The dataset will also appear in your environment in the top right pane:

Data imported

Loading data from the command-line

If the above step does not work, or you are using a different GUI/platform, you can load your file from the command-line. First of all, R needs to know where to find your files. From the “Session” menu, select “Set Working Directory > Choose Directory…”. Navigate to the directory containing your data, and click “Submit”. Alternatively, you can type in the console:

setwd("C:/Documents and Settings/User/My Documents")

Replace the directory in the quotes with your working directory.

To see the files in the current directory, type:

dir()

To load a CSV file (as prepared above), type the following:

read.table("cholesterol.csv", sep = ",", header=TRUE)

This tells R the file name and extension, that the data is separated by commas, and that the first row of data is the column headings. It should output something like this:

data

In order to be able to use this data, it must be stored as a variable. Run the following command:

cholesterol.data <- read.table("cholesterol.csv", sep = ",", header=TRUE)

This stores the dataset above as a variable called “cholesterol.data”. The dataset can be viewed by typing it’s name:

cholesterol.data

The output should be the same as the table above. To access an individual column from the data, use the $ operator. For example:

cholesterol.data$name

Should output:

[1] Vincenti      Laftavi       Ponticelli    Montagnino    Vitko        
[6] Woodle        Burke         Park          Cristinelli   Vanrenterghem
[11] Smak Gregoor  Sola          Hollander    

Performing the meta-analysis

There are two steps to performing the meta-analysis. Firstly, an effect size must be calculated for each study. In the case of discrete data this can be a Relative Risk (RR), Odds Ratio (OR) or Risk Difference (RD). Then these effect sizes much be combined in a meta-analytical model to create a summary effect size, along with confidence intervals and measures of heterogeneity.

Fortunately, the metafor package contains functions that can perform these steps individually, or more conveniently combine these steps automatically giving all of the required data. The function for meta-analysis is the rma.uni() function. In the case of binary meta-analysis, it takes data in the form of a 2×2 table:

[table]
Group, Outcome 1, Outcome 2, Total
Study Group, ai, bi, n1i
Control Group, ci, di, n2i
[/table]

So looking at our cholesterol data from section 3.2, we have values for ai, ci, n1i and n2i. Try the following:

rma.uni(ai=event.s, n1i=n.s, ci=event.c, n2i= n.c, data=cholesterol.data, slab=paste(name, year), measure="RR")

In brackets, we are telling the function the names of the columns containing the data (event.s, n.s, event.c, n.c) and the name of the dataset (data=cholesterol.data). The argument (slab=paste(name, year)) provides a label for each study by combining the name and year columns, separated by a space. The final argument tells the function that we wish to use the relative risk. The command should produce the following output:

Random-Effects Model (k = 13; tau^2 estimator: REML)

tau^2 (estimate of total amount of heterogeneity): 0.0225 (SE = 0.0225)
tau (sqrt of the estimate of total heterogeneity): 0.1499
I^2 (% of total variability due to heterogeneity): 47.93%
H^2 (total variability / within-study variance):   1.92

Test for Heterogeneity: 
Q(df = 12) = 20.6639, p-val = 0.0555

Model Results:

estimate       se     zval     pval    ci.lb    ci.ub          
 -0.2854   0.0712  -4.0083   <.0001  -0.4249  -0.1458      *** 

---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

The default here is a random-effects model. Below this, there is an assessment of heterogeneity including the I^2 value and Chi-squared p-value. Finally, the results of the model are given, with the log relative risk, SE, p-value and borders for the 95% confidence interval. The log values can be converted to the true relative risk using the exp() function. For example:

exp(-0.2854)
[1] 0.7517135

This tells us that the relative risk from the random-effects model is 0.75. The same can be done for the borders of the confidence interval (giving 0.65-0.86).

Other rma.uni options

By default, the rma.uni function uses the restricted maximum-likelihood estimator (REML) random-effects model. By adding more options to our command, we can modify these defaults. Some useful options are:

[table th=0]
method,"The meta-analysis model used. This can be 'REML', 'DL' (DeSimonian-Laird), or 'FE' (fixed effects). Other models are available."
measure, "The summary method used. This can be 'RR', 'OR' or 'RD' for binary data, 'MD' or 'SMD' for continuous data."
subset, "A subset of studies from the dataset to meta-analyse. This will be described in a later post."
mods, "Moderator variables for a mixed-effects analysis. This will be described in a later post."
[/table]

To give another example, try the following command:

rma.uni(ai=event.s, n1i=n.s, ci=event.c, n2i= n.c, data=cholesterol.data, slab=paste(name, year), measure="OR", method="FE")

This gives:

Fixed-Effects Model (k = 13)

Test for Heterogeneity: 
Q(df = 12) = 19.6044, p-val = 0.0750

Model Results:

estimate       se     zval     pval    ci.lb    ci.ub          
 -0.5898   0.1005  -5.8685   <.0001  -0.7868  -0.3928      *** 

---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Now a fixed-effects model is used, with results reported as the log odds-ratio. Exp(-0.5898) tells us that the combined odds ratio from the fixed-effects model is 0.55.

In the next post, I will deal with the meta-analysis of continuous data.

Leave a Reply

Awards

Previous article

DPhil for John O’Callaghan
Staff

Next article

Simon Knight a new daughter