Prop_scores systematic review

Etienne Gayat Romain Pirracchio Matthieu Resche-Rigon Alexandre Mebazaa Jean-Yves Mary Raphae¨l Porcher Propensity scores in intensive care and anaesthesiology literature:

a systematic review

Received:15April2010

Accepted:8July2010

Published online:6August2010

óCopyright jointly held by Springer and ESICM2010

E.GayatáR.Pirracchioá

M.Resche-RigonáJ.-Y.MaryáR.Porcher Clinical Epidemiology and Biostatistics, INSERM U717,University Paris7, Paris,France

E.Gayat())áR.PirracchioáA.Mebazaa Department of Anesthesiology

and Intensive Care,Lariboisie`re University Hospital,Paris VII University,

Inserm U717and University of Paris Diderot-Paris7,2Rue Ambroise Pare′, 75010Paris,France

e-mail:etienne.gayat@lrb.aphp.fr Tel.:?33-1-49958071

Fax:?33-1-49958083

A.Mebazaa

INSERM U942,Paris,France Abstract Introduction:Propen-

sity score methods have been

increasingly used in the last10years.

However,the practical use of the

propensity score(PS)has been

reported as heterogeneous in several

papers reviewing the use of propen-

sity scores and giving some advice.

No precedent work has focused on the

speci?c application of PS in intensive

care and anaesthesiology literature.

Objectives:After a brief develop-

ment of the theory of propensity

score,to assess the use and the quality

of reporting of PS studies in intensive

care and anaesthesiology,and to

evaluate how past reviews have

in?uenced the quality of the report-

ing.Study design and

setting:Forty-seven articles pub-

lished between2006and2009in the

intensive care and anaesthesiology

literature were evaluated.We extrac-

ted the characteristics of the report,

the type of analysis,the details of

matching procedures,the number of

patients in treated and control groups,

and the number of covariates included

in the PS models.Results:Of the47

articles reviewed,26used matching

on PS,12used strati?cation on PS

and9used adjustment on PS.The

method used was reported in81%of

the articles,and the choice to conduct

a paired analysis or not was reported

in only15%.The comparison with

the previously published reviews

showed little improvement in report-

ing in the last few years.

Conclusion:The quality of report-

ing propensity scores in intensive care

and anaesthesiology literature should

be improved.We provide some rec-

ommendations to the investigators in

order to improve the reporting of PS

analyses.

Keywords Propensity scoreá

PropensityáMatchingáReviewá

MethodologyáIntensive careá

Anaesthesiology

Introduction

Randomised controlled trials(RCTs)are generally con-sidered the gold standard for assessing the ef?cacy of medications,medical procedures or clinical strategies. Indeed,several eminent statisticians have argued that scienti?c committees should‘‘just say no’’to non-ran-domised studies because of their inherent bias[1]. However,as noted by Rossi and Freeman[2],randomi-sation may be dif?cult to apply or maintain in several situations:when the enrolment demand is minimal,in emergency situations or when randomising to a control group might be considered as unethical.Furthermore,the conclusions drawn from RCTs are likely to be less gen-eralisable than those from observational studies[3–5]. Indeed,the complex process of inclusion in randomised trials,including differential participation by centres, physicians or patients,may limit the con?dence with which the results can be applied in routine practice. Moreover,the process of recruitment,in which only

Intensive Care Med(2010)36:1993–2003

DOI10.1007/s00134-010-1991-5REVIEW

individuals (1)who are willing to participate and (2)who have prede?ned inclusion criteria are recruited,may introduce selection biases that make subjects unrepre-sentative of the reference population.

In observational studies,investigators do not control the treatment assignment,and large differences may exist between the two arms,in both observed and non-observed covariates.On the other hand,theses differences are likely to bias the estimation of treatment effect [6].If statistical methods could handle this bias,it would indeed increase the interest in observational studies in several situations.The two major strategies to control for selection bias in observational studies are (1)adjustment of treatment effect,which relies on the relationship between prognostic variables and outcome,and (2)probability-of-treatment models that rely on the relationship between prognostic values and treatment assignment.

The propensity score is de?ned as a subject’s proba-bility of receiving a speci?c treatment conditional on the observed covariates.Rosenbaum and Rubin [7]demon-strated that conditioning on the propensity score allows to obtain unbiased estimation of the treatment effect.

Studies using propensity score methods are increas-ingly reported in the anaesthesiology and intensive care literature,as illustrated by Fig.1.Providing some keys to physicians for how to accurately use,report and critically evaluate studies using propensity score analyses thus seems of particular interest.

The aims of this review were (1)to present didacti-cally the theory of propensity score (2)to assess the use and the quality of reporting of PS studies in the intensive care and anaesthesiology literature,and (3)to evaluate how past reviews have in?uenced the quality of the reporting.

The present work has two parts:(1)in the ?rst one,the theory of the propensity score is summarised,and (2)in the second part,the use and the quality of the reporting on propensity score-based studies published in seven major journals of anaesthesiology and critical care during the last 3years were studied.

Background of the propensity score

Theory of propensity score

Let us assume that each subject has observed covariates X and an indicator of treatment group Z (Z =1if treated and Z =0if control).X is a vector of covariates that might include a large number of characteristics describing that subject.The propensity score e (X )can be de?ned for each subject as the probability that he/she had to receive the treatment,given his/her baseline covariates,had he/she actually received the treatment or not [7].The theory of propensity score relies on three major assumptions:(1)the treatment assignment of a patient is independent from the others (given the X ,the Z are independent),(2)there are no unmeasured confounders (i.e.,all the covariates potentially related to the treatment assignment are known and measured),(3)the treatment assignment is strongly ignorable given the covariates,i.e.the treatment assign-ment and the response are known to be conditionally independent given the covariates.

Under the preceding assumptions,as demonstrated by Rosenbaum and Rubin,a treated and a non-treated patient with the same propensity score can be considered as randomly assigned to each group.In practice,the pro-pensity score is not known,but has to be estimated.The usual methodology uses a logistic model to estimate the probability of being treated given the observed covariates.Analysis using the propensity score

The four most common techniques that may use the propensity score are:(1)matching,(2)strati?cation (also called subclassi?cation),(3)regression adjustment and (4)more recently weighting with the propensity score [8].

Matching (or matched sampling)

Matching is a common technique that is used to couple control and treated subjects on the basis of a similar value of their covariates.Although the idea of ?nding matches seems straightforward,it is often dif?cult to ?nd subjects who are identical or even similar (i.e.that can be mat-ched)on all important covariates,even when there are

only a few background covariates.Propensity

score Fig.1Number of published articles reporting propensity score-based studies in the medical literature from 1998–2009

1994

matching solves this problem by allowing the investigator to control for many background covariates simultaneously by matching on a single scalar variable,namely the pro-pensity score[9–12].

The performance of different propensity score matching methods that are commonly employed in the medical literature have been compared recently[13].The most commonly used and well-performing method con-sists of random ordering of the treated and control subjects,then selecting the?rst treated subject and?nding the control subject with the closest propensity score.To avoid matching subjects with too different propensity scores,a calliper is often used,where matching can only occur with a given range of propensity score values or, more frequently,of the logit of the propensity score(i.e. log(p/1-p),which is the value directly obtained,by solving the logistic regression).

Strati?cation(or subclassi?cation)

Strati?cation of the propensity score consists of estimat-ing treatment effect within strata usually de?ned as quintiles of the propensity score.

Adjustment

As in a classical multivariable model,adjustment of the PS is achieved by including the propensity score as an explicative covariate when modelling the treatment effect. Weighting

The idea of reweighting treated and control subjects by corresponding propensity score to make them more repre-sentative of the population of interest was?rst proposed by Rubin[14]and more recently studied by Lunceford et al.

[8].The weight affected for a treated subject is the inverse of its propensity score,and the weight for a control subject is the inverse of‘‘1minus’’its propensity score.

Matching on the propensity score has now been clearly demonstrated to be the best use of the propensity score in order to attempt to provide unbiased estimation of the treatment effect[13].Of note,there are settings where conventional regression methods are superior to propen-sity score methods,with respect to bias as well as with respect to precision.

Practical settings of matching

When one chooses to match on the propensity score,he or she has to?x basically three features:(1)the balance of the matching,(2)matching with or without replacement and(3)the matching algorithm.

Most of the clinical studies use‘‘one-to-one’’(1:1) matching.In this case,pairs of treated and untreated subjects with a similar propensity score are constituted. Many-to-one and many-to-many matchings are also pos-sible,but are rarely employed in practice.A dif?culty in using many-to-one or many-to-many matching is that diagnostics for assessing the relative balance in baseline covariates are less well developed in this context than in the setting of1:1matching.

In matching without replacement,an untreated subject who has already been matched with a treated subject is no longer available as a potential match for other treated subjects.On the contrary,when matching with replace-ment is used,the same subject can be used several times as a match control for treated patients.Matching with replacement is rarely employed[15].

There are two types of matching algorithms:greedy matching and optimal matching.Greedy matching con-sists of selecting a random treated subject and then matching it to the nearest untreated subject.This untreated subject is selected even if it would better serve as a match for a subsequent treated subject.Once selec-ted,it is no longer available for subsequent matching. With optimal matching,pairs of treated and untreated subjects are formed so as to minimise the total within-pair differences in the propensity score.In this case, previously formed pairs could be unform if another combination could minimise the total within-pair difference.

The two most popular procedures to perform the matching are nearest neighbour matching within?xed calliper widths and5–1digit matching.The?rst one attempts to match each treated subject to the nearest untreated subject within a speci?ed calliper width:the maximum difference in the PS between two matched subjects is prede?ned,generally as a fraction of the standard deviation(SD)of the propensity score on the logit scale.The most frequently used value is0.2SD, even if more stringent criteria such as0.05SD seem to perform slightly better,at the price of a smaller number of treated patients matched[13].Using the second approach,treated subjects are?rst matched to untreated subjects on the?rst?ve digits of the propensity score. For those treated subjects that remain unmatched,mat-ches with untreated subjects are then attempted by matching with the remaining untreated subjects on the ?rst four digits of the propensity score.This process proceeds until unmatched treated subjects are matched to untreated subjects on the?rst digit of the propensity score.Treated and untreated subjects that remain unmatched are then discarded.This method has been used frequently in the medical literature[16–20],despite the fact that its performance has never been rigorously examined.

1995

Assessment of the propensity score matching

Concerning the assessment of the propensity score matching quality itself,it has been shown that there was no association between the area under the ROC curve (c -statistic)or any goodness-of-?t test and the ability of a given propensity score to accurately balance prognosti-cally important variables between treated and untreated subjects in a propensity score matched sample [21].Measure of goodness-of-?t and discrimination also do not provide useful information to detect missing confounders in the propensity score model [22].

When matching on the propensity score is used,the objective is to balance the prognostic factors between groups.The success of matching can be assessed for each variable using standardised differences (d )[23],where d is de?ned as:d ?

100?ex treatment àx control T

??????????????????????s 2treatment ts 2

control

q with x treatment ,s treatment 2,x control ,s 2

control the mean and the variance in the treatment group and in the control group,respectively.

A successful balance is inferred if residual imbalance as measured by d is small for all confounders.A value of d

B 10%has been empirically considered as acceptable [23].

The study conducted by Wijeysundera et al.[24]on the effect of epidural analgaesia on mortality after inter-mediate-to-high risk non-cardiac surgery is a good example.The authors produced a table with standardised differences before and after matching.We propose a more visual approach to present these differences using a graphical display of the standardised difference before and after matching (see Fig.2).As was easily noticed,there were many differences before matching that could induce bias in the estimating of the effect of epidural analgesia.For example,in the original sample,the patients receiving epidural analgesia received more arte-rial and central venous lines,and these differences were no longer present in the matched sample,leading to more comparable groups of patients.

Outcome analysis after matching on the propensity score

The model to be used for analysis is primarily driven by the type of outcome (continuous,binary or censored).However,matching on the propensity score may induce correlation into the dataset,which may in?uence the estimation of the variance of treatment effect.Therefore,it is recommended to use robust approaches to compute the variance of estimates [25,26].

Review of the literature

Materials and methods Search strategy

A computerised search of the seven journals with the highest impact factors in intensive care and anaesthesi-ology areas was performed from 1January 2006to 31December 2009for publications in English limited to clinical observational studies using the search terms propensity score OR propensity.Retrieved articles were assessed by one the authors (EG),who screened the titles and abstracts to identify relevant studies.Articles were included only if the study was identi?ed as an observa-tional study using the propensity score method.Letters and articles for which only the abstract was available or reports describing only the design of the trial were excluded.Articles were screened for successive publica-tions of the same study (i.e.the same study described in several articles),and only the one with the more detailed article was selected.

The seven studied journal were:American Journal of Respiratory and Critical Care Medicine (AJRCCM ),Critical Care Medicine (CCM ),Critical Care (CC ),Anesthesiology ,Intensive Care Medicine (ICM ),Anes-thesia and Analgesia (Anesth Analg )and The British Journal of Anaesthesia (BJA ).Evaluation of methodological quality

Two independent reviewers (EG and RP)tested a data extraction form with a distinct set of ?ve articles during a training session.To assess interobserver reliability,the two reviewers independently extracted information from a computer-generated random sample of ten articles.Reviewers were not blinded to the journal and authors.A single reviewer (EG)extracted the following data from all reports:

(1)Characteristics of the report,including the journal name,medical area,number of exposed and control subjects in the original study sample and type of outcome (binary,continuous or censored).

(2)Type of propensity score analysis:matching,strati?cation,adjustment or weighting.

(3)In case of matching,the number of matched pairs,method used,balance,calliper size,replacement or not,imbalance’s assessment or not,and if yes,how (test or percentage of imbalance),and statistical analysis adapted to matched data or not.

(4)In case of strati?cation of the propensity score:number and de?nition of strata.

(5)For all articles,the number of covariates included in the propensity score,names of included covariates (pre-cised or not),reasons for the included covariates mentioned

1996

or not,adjustment of covariates or not,analysis without propensity score given,and if yes,concordant or not,and lastly the software used for the analyses.

Statistical analysis

Descriptive statistics(median,?rst and third quartile) were used for continuous variables.Categorical variables were described with counts and percentages.The degree of agreement between the two reviewers was determined using the kappa statistic(j)for categorical variables and the intra-class correlation coef?cient(ICC)for continuous variables.Analyses were performed using R statistical package(online at https://www.360docs.net/doc/1f5377230.html,, The R Foundation for Statistical Computing,Vienna, Austria).

Results

We retrieved47articles published in the seven most important intensive care and anaesthesiology journals employing propensity score-based methods during the studied period:4from AJRCCM,9from Anesth Analg,9 from Anesthesiology,1from BJA,6from CC,12from CCM and6from ICM[27–74].

1997

Interobserver reproducibility (n =10)

Among the 21categorical items reviewed,the median j was 0.83and varied from 0.43(for ‘‘rough and PS analysis both reported’’)to 1.0(for ‘‘paired analysis reported in case of matching’’).Concerning the six continuous items,the median ICC was 0.99with a range from 0.69(for the caliper size in case of matching)to [0.99(for the sample size).Quality of the reporting (Table 1)

Approximately 26articles reported analyses bound to propensity score matching.No article reported use of weighting on the propensity score.Nine articles used strati?cation of the propensity score and 12adjustment of the propensity score.

A higher number of covariates were included in the propensity score in case of matching,with a higher number of subjects.The description of covariates inclu-ded in the propensity score was missing in 9–28%of cases,depending of the method used.The choice of the variables included in the PS model was discussed only in 46–78%of the studies.

The outcome of interest was a binary endpoint in most of cases,regardless of the method used.

Particularly in case of matching,the proportion of studies,where the number of treated subjects per covar-iate included in the propensity score was less than 10(%of ratio \10in the table)was high (33%).Of note,the cutoff of ‘‘1’’,which included the covariate for ten events,was based on the rule of thumb described by Harrel [75].

Matching on the propensity score (Table 2)

Twenty-six (55%)articles used matching on propensity score methods.Twenty-one articles (81%)reported the method used to form matched pairs.For those where the method was given,nearest neighbour matching was used in 8cases and greedy matching in 13cases.The large majority of studies reported 1:1matching (23;88%);two articles reported 1:2matching.The information about matching with replacement or not was clearly stated in only ?ve articles.The part of the initial sample kept after matching on the propensity score was 32%(24–54%).The authors mentioned performing a paired analysis in only four (15%)articles.

Comparison with the previous literature reviews The Table 3shows that the number of patients in studies using propensity score analyses remained high with a similar number of covariates included in the score,but an increase %of ratio \10.

Matching is increasingly used.The quality of the reporting of these analyses tends to improve when com-pared to former reviews [76–80].Indeed,the method by which matched pairs were created was more often reported in our review,and we observed a trend to an increased use of standardised differences to assess imbalance in place of statistical testing.However,imbalance was still not assessed in 23%of the cases.

Discussion

The objective of this study was to examine whether,after several published reviews,appropriate statistical methods were used for the propensity score analysis in the intensive care and anaesthesiology literature.We

have chosen to focus on the seven most leading journals in terms of the impact factor to present an overview that

Table 1Characteristics of PS development of the three techniques

All articles

Matching Adjustment Strati?cation No.of articles 47(100)

26(55)

12(26)

9(19)

No.of patients

2,186(498–5,612)2,199(544–3,030)1,000(214–1,341)604(413–1,673)Covariates included in the propensity score model Number

15(9–22)18(14–32)10(5–16)12(7–12)No.of treated patients per covariate included in the model 19(10–71)20(10–57)15(5–31)49(14–181)Event/variable ratio \1011(31)6(33)3(33)2(25)Names precise 36(78)18(72)10(91)7(78)Choice discussed 26(55)12(46)6(55)7(78)Type of endpoint Binary 33(70)18(69)9(82)5(56)Censored 11(23)6(23)2(18)3(33)Continuous

3(6)

2(8)

0(0)

1(11)

Results are expressed as count (%),median (interquartile)

The event/variable ratio \10criterion represents the number of articles in which the rule of thumb was not respected,i.e.in which more than one covariate for ten events has been included in the propensity score model.This leads to non-parsimonious models

1998

should represent the most relevant clinical research in that area.

The main information derived from our study was a trend of improvement of the reporting and use of pro-pensity score-based analysis as compared to situations reported in earlier reviews.However,many issues are not yet optimal.In a precedent review,Austin [79]noted that propensity score matching tended to be poorly

documented in the medical literature between 1996and 2003.We noted that,contrary to a growing number of publications using propensity score analyses,the quality of the reporting did not increase as desired.The improvements we noted mainly concern the technique used for assessing covariate imbalance and the reporting of the method used to match patients.The authors rely more frequently on standardised differences,which have been shown to better assess imbalance.

The increasing use of propensity score methods associated with a non-optimal quality of use could be explained by the fact that,along with multiple publica-tions on this subject,propensity scores have become popular in the last years.Concurrently,methods for pro-pensity score analysis have been implemented in many usual statistical softwares,providing the opportunity for all to carry out propensity score analyses.Thus,many authors are more likely to report the name of the soft-ware’s macro than the real name of the algorithm used to obtain matched pairs.Thus,a reasonable explanation may be that the less the investigators are familiar with the theory of the method for propensity score analyses,the more they are likely to rely on the method implemented in the software they use.

Our review has several limitations.First,we chose to focus on seven journals,which may not be representative.However,they are commonly considered as the best journals in the domain,and so the imperfection in the use and reporting of propensity scores is probably underesti-mated.Second,as in most systematic reviews of the literature,it is dif?cult to evaluate exactly which analyses

Table 2Reporting characteristics of articles using PS-matched analyses (n =26)Remaining patients after matching (%)32(24–54)Method

5–1Greedy matching

13(50)Nearest neighbour matching 8(31)Non-detailed 5(19)Balance 1/123(88)1/22(8)1/3

1(4)Replacement Yes 0(0)No

5(19)Non-detailed

21(81)Paired analysis discussed

4(15)

Results are expressed as count (%),median (interquartile)

%of remaining patients after matching refers to the proportion of treated patients for whom a control cannot be found in the control group;these patients are excluded from the analysis.Balance refers to the number of control patients to be matched to one treated patient.Replacement is considered if a control patient was poten-tially matched to more than one treated patient

Table 3Comparison with the results of previous reviews

Weitzen et al.[76]

Shah

et al.[77]Sturmer et al.[78]Austin [79]Austin [80]Present study Period

Year-2001Up to June 2003Up to end of 20031996–20032004–20062006–2009No.of articles

1774760

47Cardiovascular literature 30(64)19(44)90(51)NA 60(100)–

No.variables

17(8–27)

NA (2–112)

NA NA 15(9–22)(3–54)No.of treated patients 805(182–3,802)NA (61to [1,380,000)NA NA 968(121–1,310)(16–5,698)

No.of patients

in the control group NA NA NA NA NA 1,307(392–4,367)(84–33,749)Event/variable ratio \105(11)NA NA NA NA 11(31)Articles using matching Number

7(15)11(26)51(29)47(100)60(100)26(55)

Evaluation of imbalance Present 7(100)8(73)NA 39(83)51(82)20(77)With test NA NA NA 33(70)47(78)14(54)With STD

NA NA NA

2(4)0(0)6(23)

%Exposed matched NA NA 90(26–100)NA NA 79(69–97)(1–100)Method reported

32(68)

43(72)

21(81)

Results are expressed as count (%),median (interquartile),median (min–max)or (min–max)

Event/variable ratio \10represents the number (and percentage)of studies in which more than 1variable for 10treated patients was included in the propensity score models.%Exposed matched means the ratio of treated patients from the original sample remaining in the matched sample

STD standardised difference,No number

1999

were performed using what is reported in the published articles.Therefore,it is possible that much of the model development and assessment were adequately performed by the researchers,but not explicitly described in their articles.Third,important information about propensity score analysis,including description of the types of vari-ables,variable selection procedures and the speci?c method used,have not been abstracted systematically,because these have rarely been presented with suf?cient detail in published reports.Fourth,this review was limited to seven specialised journals.If some articles reporting the propensity score analysis in anaesthesiology or intensive care study were published in general journals,they were not included in the present review.The comparison to the previous reviews could be biased because of this limita-tion.There is however no reason why a difference should exist in the quality of reporting for this particular area compared to the remaining part of the medical literature.In conclusion,we think several recommendations could be implemented in order to improve the reporting of

propensity score methods,particularly in case of matching on the propensity score,which has been shown to be the most accurate way to use propensity scores [25,81,82].First,it is important that the authors explain how they constituted the propensity score.In particular,the choice of covariates included in the propensity score model have to be discussed,and the authors should provide suf?cient information about how they have identi?ed all the potential confounders and included them.Actually,it has been shown that omitting confounders increases the bias of the treatment effect estimation,even if balance is well obtained for the other covariates.

Secondly,concerning the assessment of the propensity score quality itself in the particular setting of matching on the propensity score,it has been proven that there was no association between the ROC curve area (c -statistic)and the ability of a given propensity score to balance prognostically important variables between treated and untreated subjects [21].Actually,measures of goodness-of-?t and discrimination do not provide information to

Table 4Key points to critically read a propensity score-based study Items

Example using the Ref.[26]Type of cohort Retrospective Type of data Administrative

Type of exposure Epidural anaesthesia or

analgaesia within 1day of surgery

Primary endpoint

All-cause death within 30days after surgery Number of patients In the treated group 56,556In the control group 202,481

Development of the propensity score

Number of covariates included 24

Choice of covariates justi?ed Clinical signi?cance Used model

Non-parsimonious

multivariate logistic model Chosen method (matching/subclassi?cation/

adjustment/weighting)Matching on the propensity score In case of subclassi?cation,number of strata –In case of adjustment,model used

–

Matching procedure Used method

Greedy matching algorithm with a calliper width of 0.2SD of the log odds of the estimated propensity score Balance

1/1

Replacement or not

Without

Evaluation of covariate balance With standardised differences Number of patients per group 44,094/44,094

Outcome analysis Model

Agresti and Min method [83]Paired analysis

Yes

Application to an example taken from Ref.[26]

2000

detect missing confounders in propensity score models [22].Thirdly,the most important goal of the propensity score is to obtain balance among covariates between treatment and control groups,and the balance obtained has to be checked and reported.A table showing a‘‘success-ful’’matching on a set of covariates may convince the reader that the propensity score has been ef?cient in bal-ancing groups and hence yields an unbiased estimation of the treatment effect.An optimal way to assess this balance might be to compute the standardised differences before and after matching on each measured covariate;a graphic representation seems to be a well-suited way to represent the ability of the propensity score to balance the groups. Finally,the authors should report results of the analyses with and without the propensity score in order to appreciate the information carried by the propensity score analysis.

Accordingly,in Table4we propose a check-list ded-icated to the reading of article reporting propensity score-based analyses.As an illustration,it was applied it to the work of Wijeysundera et al.[24]previously cited in this article.In this example,which used matching on the propensity score analysis,the quality of the reporting could be judged as well.More,the analysis seems to respect all the conditions,including paired analysis.Of note,in this example,probably because of the large number of patients,78%of the treated subjects from the original sample remained in the matched sample.

In Fig.3,we also tried to synthesise the different steps of planning and analysis of an observational study using propensity score methods(see Fig.3).Ideally,propensity score methods,in particular matching,should be applied to an observational study designed especially and a priori for this kind of analysis.Thus,a documented statistical analysis plan should be established and decided upon prior to any analysis being conducted.This methodology could lead to measure a maximum number of potential confounders and so to obtain unbiased estimation of treatment effect.

In summary,our study shows that the propensity score is increasingly used in the intensive care and anaesthesi-ology literature(Fig.1).There is however room for improvement in the presentation of these results.

References

1.Ellenberg JH(1994)Selection bias in

observational and experimental studies.

Stat Med13:557–567

2.Rossi P,Freeman H(1993)A

systematic approach.Sage Publications, Inc,Newbury Park

3.Corrie P,Shaw J,Harris R(2003)Rate

limiting factors in recruitment of

patients to clinical trials in cancer

research:descriptive study.BMJ

327:320–321

4.Fossa SD,Skovlund E(2002)Selection

of patients may limit the

generalizability of results from cancer trials.Acta Oncol41:131–137

5.Guyatt GH,Sackett DL,Cook DJ

(1994)Users’guides to the medical

literature.II.How to use an article

about therapy or prevention.B.What

were the results and will they help me in caring for my patients?Evidence-

based medicine working group.Jama

271:59–63

6.Pocock SJ,Elbourne DR(2000)

Randomized trials or observational

tribulations?N Engl J Med

342:1907–1909

7.Rosenbaum P,Rubin D(1983)The

central role of the propensity score in

observational studies for causal effect.

Biometrika70:41–55

8.Lunceford JK,Davidian M(2004)

Strati?cation and weighting via the

propensity score in estimation of causal treatment effects:a comparative study.

Stat Med23:2937–2960

9.Rubin D(1976)Matching methods that

are equal percent bias reducing:some

examples.Biometrics35:417–446

10.Rubin D(1979)Using multivariate

matched sampling and regression

adjustment to control bias in

observational studies.J Am Stat Assoc

74:318–324

11.Rubin D(1980)Bias reduction using

Mahalanobis metric matching.

Biometrics36:293–298

12.Carpenter R(1977)Matching when

covariables are normally distributed.

Biometrika64:299–307

13.Austin PC(2009)Some methods of

propensity-score matching had superior

performance to others:results of an

empirical investigation and Monte

Carlo simulations.Biom J51:171–184

14.Rubin D(2001)Using propensity scores

to help design observational studies.

Health Serv Out Res Method2:169–188

15.Austin PC(2008)Assessing balance in

measured baseline covariates when

using many-to-one matching on the

propensity-score.Pharmacoepidemiol

Drug Saf17:1218–1225

16.Aronow HD,Novaro GM,Lauer MS,

Brennan DM,Lincoff AM,Topol EJ,

Kereiakes DJ,Nissen SE(2003)In-

hospital initiation of lipid-lowering

therapy after coronary intervention as a

predictor of long-term utilization:a

propensity analysis.Arch Intern Med

163:2576–2582

17.Srinivasan AK,Shackcloth MJ,

Grayson AD,Fabri BM(2003)

Preoperative beta-blocker therapy in

coronary artery bypass surgery:a

propensity score analysis of outcomes.

Interact Cardiovasc Thorac Surg

2:495–500

18.Vikram HR,Buenconsejo J,Hasbun R,

Quagliarello VJ(2003)Impact of valve

surgery on6-month mortality in adults

with complicated,left-sided native

valve endocarditis:a propensity

analysis.JAMA290:3207–3214

19.Elad Y,French WJ,Shavelle DM,

Parsons LS,Sada MJ,Every NR(2002)

Primary angioplasty and selection bias

inpatients presenting late([12h)after

onset of chest pain and ST elevation

myocardial infarction.J Am Coll

Cardiol39:826–833

20.Sabik JF,Gillinov AM,Blackstone EH,

Vacha C,Houghtaling PL,Navia J,

Smedira NG,McCarthy PM,Cosgrove

DM,Lytle BW(2002)Does off-pump

coronary surgery reduce morbidity and

mortality?J Thorac Cardiovasc Surg

124:698–707

21.Austin PC,Grootendorst P,Anderson

GM(2007)A comparison of the ability

of different propensity score models to

balance measured variables between

treated and untreated subjects:a Monte

Carlo study.Stat Med26:734–753

2001

22.Weitzen S,Lapane KL,Toledano AY,

Hume AL,Mor V(2005)Weaknesses

of goodness-of-?t tests for evaluating

propensity score models:the case of the omitted confounder.

Pharmacoepidemiol Drug Saf

14:227–238

23.Austin PC(2009)Balance diagnostics

for comparing the distribution of

baseline covariates between treatment

groups in propensity-score matched

samples.Stat Med28:3083–3107 24.Wijeysundera DN,Beattie WS,Austin

PC,Hux JE,Laupacis A(2008)

Epidural anaesthesia and survival after

intermediate-to-high risk non-cardiac

surgery:a population-based cohort

https://www.360docs.net/doc/1f5377230.html,ncet372:562–569

25.Austin PC(2007)The performance of

different propensity score methods for

estimating marginal odds ratios.Stat

Med26:3078–3094

26.Austin PC(2008)The performance of

different propensity-score methods for

estimating relative risks.J Clin

Epidemiol61:537–545

27.Almog Y,Novack V,Eisinger M,

Porath A,Novack L,Gilutz H(2007)

The effect of statin therapy on

infection-related mortality in patients

with atherosclerotic diseases.Crit Care

Med35:372–378

28.Annane D,Sebille V,Duboc D,Le

Heuzey JY,Sadoul N,Bouvier E,

Bellissant E(2008)Incidence and

prognosis of sustained arrhythmias in

critically ill patients.Am J Respir Crit

Care Med178:20–25

29.Biki B,Mascha E,Moriarty DC,

Fitzpatrick JM,Sessler DI,Buggy DJ

(2008)Anesthetic technique for radical prostatectomy surgery affects cancer

recurrence:a retrospective analysis.

Anesthesiology109:180–187

30.Clec’h C,Alberti C,Vincent F,

Garrouste-Orgeas M,de Lassence A,

Toledano D,Azoulay E,Adrie C,

Jamali S,Zaccaria I,Cohen Y,Timsit

JF(2007)Tracheostomy does not

improve the outcome of patients

requiring prolonged mechanical

ventilation:a propensity analysis.Crit

Care Med35:132–138

https://www.360docs.net/doc/1f5377230.html,bes A,Luyt CE,Nieszkowska A,

Trouillet JL,Gibert C,Chastre J(2007) Is tracheostomy associated with better

outcomes for patients requiring long-

term mechanical ventilation?Crit Care

Med35:802–807

32.de Boer MT,Christensen MC,

Asmussen M,van der Hilst CS,

Hendriks HG,Slooff MJ,Porte RJ

(2008)The impact of intraoperative

transfusion of platelets and red blood

cells on survival after liver

transplantation.Anesth Analg

106:32–44(table of contents)33.Dhainaut JF,Payet S,Vallet B,Franca

LR,Annane D,Bollaert PE,Le Tulzo

Y,Runge I,Malledant Y,Guidet B,Le

Lay K,Launois R(2007)Cost-

effectiveness of activated protein C in

real-life clinical practice.Crit Care

11:R99

34.Duncan AI,Koch CG,Xu M,Manlapaz

M,Batdorf B,Pitas G,Starr N(2007)

Recent metformin ingestion does not

increase in-hospital morbidity or

mortality after cardiac surgery.Anesth

Analg104:42–50

35.Duncan AI,Lin J,Koch CG,Gillinov

AM,Xu M,Starr NJ(2006)The impact

of gender on in-hospital mortality and

morbidity after isolated aortic valve

replacement.Anesth Analg

103:800–808

36.Ender J,Borger MA,Scholz M,Funkat

AK,Anwar N,Sommer M,Mohr FW,

Fassl J(2008)Cardiac surgery fast-

track treatment in a postanesthetic care

unit:six-month results of the Leipzig

fast-track concept.Anesthesiology

109:61–66

37.Eurich DT,Marrie TJ,Johnstone J,

Majumdar SR(2008)Mortality

reduction with in?uenza vaccine in

patients with pneumonia outside‘‘?u’’

season:pleiotropic bene?ts or residual

confounding?Am J Respir Crit Care

Med178:527–533

38.Fellahi JL,Parienti JJ,Hanouz JL,

Plaud B,Riou B,Ouattara A(2008)

Perioperative use of dobutamine in

cardiac surgery and adverse cardiac

outcome:propensity-adjusted analyses.

Anesthesiology108:979–987

39.Grathwohl KW,Black IH,Spinella PC,

Sweeney J,Robalino J,Helminiak J,

Grimes J,Gullick R,Wade CE(2008)

Total intravenous anesthesia including

ketamine versus volatile gas anesthesia

for combat-related operative traumatic

brain injury.Anesthesiology109:44–53

40.Griesdale DE,Bosma TL,Kurth T,Isac

G,Chittock DR(2008)Complications

of endotracheal intubation in the

critically ill.Intensive Care Med

34:1835–1842

41.Hix JK,Thakar CV,Katz EM,Yared

JP,Sabik J,Paganini EP(2006)Effect

of off-pump coronary artery bypass

graft surgery on postoperative acute

kidney injury and mortality.Crit Care

Med34:2979–2983

42.Honiden S,Schultz A,Im SA,Nierman

DM,Gong MN(2008)Early versus late

intravenous insulin administration in

critically ill patients.Intensive Care

Med34:881–887

43.Kertai MD,Westerhout CM,Varga KS,

Acsady G,Gal J(2008)Dihydropiridine

calcium-channel blockers and

perioperative mortality in aortic

aneurysm surgery.Br J Anaesth

101:458–465

44.Le Manach Y,Godet G,Coriat P,

Martinon C,Bertrand M,Fleron MH,

Riou B(2007)The impact of

postoperative discontinuation or

continuation of chronic statin therapy

on cardiac outcome after major vascular

surgery.Anesth Analg104:1326–1333

(table of contents)

45.Ranucci M,Isgro G(2007)Minimally

invasive cardiopulmonary bypass:does

it really change the outcome?Crit Care

11:R45

46.Rowan KM,Welch CA,North E,

Harrison DA(2008)Drotrecogin alfa

(activated):real-life use and outcomes

for the UK.Crit Care12:R58

47.Scales DC,Thiruchelvam D,Kiss A,

Redelmeier DA(2008)The effect of

tracheostomy timing during critical

illness on long-term survival.Crit Care

Med36:2547–2557

48.Schortgen F,Girou E,Deye N,

Brochard L(2008)The risk associated

with hyperoncotic colloids in patients

with shock.Intensive Care Med

34:2157–2168

49.Thombs BD,Bresnick MG(2008)

Mortality risk and length of stay

associated with self-in?icted burn

injury:evidence from a national sample

of30,382adult patients.Crit Care Med

36:118–125

50.Tritapepe L,De Santis V,Vitale D,

Nencini C,Pellegrini F,Landoni G,

Toscano F,Miraldi F,Pietropaoli P

(2007)Recombinant activated factor

VII for refractory bleeding after acute

aortic dissection surgery:a propensity

score analysis.Crit Care Med

35:1685–1690

51.Vandijck DM,Benoit DD(2008)

Impact of recent intravenous

chemotherapy on outcome in severe

sepsis and septic shock patients with

haematological malignancies:reply to

letter by Meyer et al.Intensive Care

Med34:1930–1931

52.Vincent JL,Sakr Y,Sprung C,Harboe

S,Damas P(2008)Are blood

transfusions associated with greater

mortality rates?Results of the sepsis

occurrence in acutely Ill patients study.

Anesthesiology108:31–39

53.Berger MM,Soguel L,Shenkin A,

Revelly JP,Pinget C,Baines M,

Chiolero RL(2008)In?uence of early

antioxidant supplements on clinical

evolution and organ function in

critically ill cardiac surgery,major

trauma,and subarachnoid hemorrhage

patients.Crit Care12:R101

54.Constantinides VA,Tekkis PP,Fazil A,

Kaur K,Leonard R,Platt M,Casula R,

Stanbridge R,Darzi A,Athanasiou T

(2006)Fast-track failure after cardiac

surgery:development of a prediction

model.Crit Care Med34:2875–2882

2002

55.Meier R,Bechir M,Ludwig S,

Sommerfeld J,Keel M,Steiger P,

Stocker R,Stover JF(2008)Differential temporal pro?le of lowered blood

glucose levels(3.5to6.5mmol/l versus 5to8mmol/l)in patients with severe

traumatic brain injury.Crit Care12:R98 56.Bagshaw SM,Lapinsky S,Dial S,Arabi

Y,Dodek P,Wood G,Ellis P,Guzman J,Marshall J,Parrillo JE,Skrobik Y,

Kumar A(2009)Acute kidney injury in septic shock:clinical outcomes and

impact of duration of hypotension prior to initiation of antimicrobial therapy.

Intensive Care Med35:871–881

57.Kor DJ,Iscimen R,Yilmaz M,Brown

MJ,Brown DR,Gajic O(2009)Statin

administration did not in?uence the

progression of lung injury or associated organ failures in a cohort of patients

with acute lung injury.Intensive Care

Med35:1039–1046

58.Zarychanski R,Doucette S,Fergusson

D,Roberts D,Houston DS,Sharma S,

Gulati H,Kumar A(2008)Early

intravenous unfractionated heparin and

mortality in septic shock.Crit Care Med 36:2973–2979

59.Beattie WS,Karkouti K,Wijeysundera

DN,Tait G(2009)Risk associated with preoperative anemia in noncardiac

surgery:a single-center cohort study.

Anesthesiology110:574–581

60.Beattie WS,Wijeysundera DN,

Karkouti K,McCluskey S,Tait G,

Mitsakakis N,Hare GM(2009)Acute

surgical anemia in?uences the

cardioprotective effects of beta-

blockade:a single-center,propensity-

matched cohort study.Anesthesiology

112:25–33

61.Christensen S,Thomsen RW,Johansen

MB,Pedersen L,Jensen R,Larsen KM, Larsson A,Tonnesen E,Sorensen HT

(2009)Preadmission statin use and one-year mortality among patients in

intensive care—a cohort study.Crit

Care14:R29

62.Devasia RA,Blackman A,Gebretsadik

T,Grif?n M,Shintani A,May C,Smith T,Hooper N,Maruri F,Warkentin J,

Mitchel E,Sterling TR(2009)

Fluoroquinolone resistance in

Mycobacterium tuberculosis:the effect

of duration and timing of

?uoroquinolone exposure.Am J Respir Crit Care Med180:365–370

63.Karkouti K,Wijeysundera DN,Yau

TM,McCluskey SA,Tait G,Beattie

WS(2009)The risk-bene?t pro?le of

aprotinin versus tranexamic acid in

cardiac surgery.Anesth Analg

110:21–2964.Kerger KH,Mascha E,Steinbrecher B,

Frietsch T,Radke OC,Stoecklein K,

Frenkel C,Fritz G,Danner K,Turan A,

Apfel CC(2009)Routine use of

nasogastric tubes does not reduce

postoperative nausea and vomiting.

Anesth Analg109:768–773

65.Leslie K,Myles PS,Forbes A,Chan

MT(2009)The effect of bispectral

index monitoring on long-term survival

in the B-aware trial.Anesth Analg

110:816–822

66.Lindenauer PK,Rothberg MB,

Nathanson BH,Pekow PS,Steingrub JS

(2010)Activated protein C and hospital

mortality in septic shock:a propensity-

matched analysis.Crit Care Med

38:1101–1107

67.Manrique A,Jooste EH,Kuch BA,

Lichtenstein SE,Morell V,Munoz R,

Ellis D,Davis PJ(2009)The

association of renal dysfunction and the

use of aprotinin in patients undergoing

congenital cardiac surgery requiring

cardiopulmonary bypass.Anesth Analg

109:45–52

68.Martin G,Brunkhorst FM,Janes JM,

Reinhart K,Sundin DP,Garnett K,

Beale R(2009)The international

PROGRESS registry of patients with

severe sepsis:drotrecogin alfa

(activated)use and patient outcomes.

Crit Care13:R103

69.Payen JF,Bosson JL,Chanques G,

Mantz J,Labarere J(2009)Pain

assessment is associated with decreased

duration of mechanical ventilation in

the intensive care unit:a post Hoc

analysis of the DOLOREA study.

Anesthesiology111:1308–1316

70.Renaud B,Santin A,Coma E,Camus

N,Van Pelt D,Hayon J,Gurgui M,

Roupie E,Herve J,Fine MJ,Brun-

Buisson C,Labarere J(2009)

Association between timing of intensive

care unit admission and outcomes for

emergency department patients with

community-acquired pneumonia.Crit

Care Med37:2867–2874

71.Rioux JP,Lessard M,De Bortoli B,Roy

P,Albert M,Verdant C,Madore F,

Troyanov S(2009)Pentastarch10%

(250kDa/0.45)is an independent risk

factor of acute kidney injury following

cardiac surgery.Crit Care Med

37:1293–1298

72.Surgenor SD,Kramer RS,Olmstead

EM,Ross CS,Sellke FW,Likosky DS,

Marrin CA,Helm RE Jr,Leavitt BJ,

Morton JR,Charlesworth DC,Clough

RA,Hernandez F,Frumiento C,Benak

A,DioData C,O’Connor GT(2009)

The association of perioperative red

blood cell transfusions and decreased

long-term survival after cardiac

surgery.Anesth Analg108:1741–1746

73.van Klei WA,Bryson GL,Yang H,

Forster AJ(2009)Effect of beta-blocker

prescription on the incidence of

postoperative myocardial infarction

after hip and knee arthroplasty.

Anesthesiology111:717–724

74.Wisnivesky JP,Halm E,Bonomi M,

Powell C,Bagiella E(2009)

Effectiveness of radiation therapy for

elderly patients with unresected stage I

and II non-small cell lung cancer.Am J

Respir Crit Care Med181:264–269

75.Harrel FE(2001)Over?tting and limits

on number of predictors.In:Regression

modeling strategies.Series in Statistics.

Springer(ed),pp60–61

76.Weitzen S,Lapane KL,Toledano AY,

Hume AL,Mor V(2004)Principles for

modeling propensity scores in medical

research:a systematic literature review.

Pharmacoepidemiol Drug Saf

13:841–853

77.Shah BR,Laupacis A,Hux JE,Austin

PC(2005)Propensity score methods

gave similar results to traditional

regression modeling in observational

studies:a systematic review.J Clin

Epidemiol58:550–559

78.Sturmer T,Joshi M,Glynn RJ,Avorn J,

Rothman KJ,Schneeweiss S(2006)A

review of the application of propensity

score methods yielded increasing use,

advantages in speci?c settings,but not

substantially different estimates

compared with conventional

multivariable methods.J Clin

Epidemiol59:437–447

79.Austin PC(2008)A critical appraisal of

propensity-score matching in the

medical literature between1996and

2003.Stat Med27:2037–2049

80.Austin PC(2007)Propensity-score

matching in the cardiovascular surgery

literature from2004to2006:a

systematic review and suggestions for

improvement.J Thorac Cardiovasc

Surg134:1128–1135

81.Austin PC(2008)The performance of

different propensity score methods for

estimating marginal odds ratios,

Statistics in Medicine2007;

26:3078–3094.Stat Med27:3918–3920

82.Austin PC,Grootendorst P,Normand

SL,Anderson GM(2007)Conditioning

on the propensity score can result in

biased estimation of common measures

of treatment effect:a Monte Carlo

study.Stat Med26:754–768

83.Agresti A,Min Y(2005)Simple

improved con?dence intervals for

comparing matched proportions.Stat

Med24:729–740

2003