Prop_scores systematic review
Etienne Gayat Romain Pirracchio Matthieu Resche-Rigon Alexandre Mebazaa Jean-Yves Mary Raphae¨l Porcher Propensity scores in intensive care and anaesthesiology literature:
a systematic review
Received:15April2010
Accepted:8July2010
Published online:6August2010
óCopyright jointly held by Springer and ESICM2010
E.GayatáR.Pirracchioá
M.Resche-RigonáJ.-Y.MaryáR.Porcher Clinical Epidemiology and Biostatistics, INSERM U717,University Paris7, Paris,France
E.Gayat())áR.PirracchioáA.Mebazaa Department of Anesthesiology
and Intensive Care,Lariboisie`re University Hospital,Paris VII University,
Inserm U717and University of Paris Diderot-Paris7,2Rue Ambroise Pare′, 75010Paris,France
e-mail:etienne.gayat@lrb.aphp.fr Tel.:?33-1-49958071
Fax:?33-1-49958083
A.Mebazaa
INSERM U942,Paris,France Abstract Introduction:Propen-
sity score methods have been
increasingly used in the last10years.
However,the practical use of the
propensity score(PS)has been
reported as heterogeneous in several
papers reviewing the use of propen-
sity scores and giving some advice.
No precedent work has focused on the
speci?c application of PS in intensive
care and anaesthesiology literature.
Objectives:After a brief develop-
ment of the theory of propensity
score,to assess the use and the quality
of reporting of PS studies in intensive
care and anaesthesiology,and to
evaluate how past reviews have
in?uenced the quality of the report-
ing.Study design and
setting:Forty-seven articles pub-
lished between2006and2009in the
intensive care and anaesthesiology
literature were evaluated.We extrac-
ted the characteristics of the report,
the type of analysis,the details of
matching procedures,the number of
patients in treated and control groups,
and the number of covariates included
in the PS models.Results:Of the47
articles reviewed,26used matching
on PS,12used strati?cation on PS
and9used adjustment on PS.The
method used was reported in81%of
the articles,and the choice to conduct
a paired analysis or not was reported
in only15%.The comparison with
the previously published reviews
showed little improvement in report-
ing in the last few years.
Conclusion:The quality of report-
ing propensity scores in intensive care
and anaesthesiology literature should
be improved.We provide some rec-
ommendations to the investigators in
order to improve the reporting of PS
analyses.
Keywords Propensity scoreá
PropensityáMatchingáReviewá
MethodologyáIntensive careá
Anaesthesiology
Introduction
Randomised controlled trials(RCTs)are generally con-sidered the gold standard for assessing the ef?cacy of medications,medical procedures or clinical strategies. Indeed,several eminent statisticians have argued that scienti?c committees should‘‘just say no’’to non-ran-domised studies because of their inherent bias[1]. However,as noted by Rossi and Freeman[2],randomi-sation may be dif?cult to apply or maintain in several situations:when the enrolment demand is minimal,in emergency situations or when randomising to a control group might be considered as unethical.Furthermore,the conclusions drawn from RCTs are likely to be less gen-eralisable than those from observational studies[3–5]. Indeed,the complex process of inclusion in randomised trials,including differential participation by centres, physicians or patients,may limit the con?dence with which the results can be applied in routine practice. Moreover,the process of recruitment,in which only
Intensive Care Med(2010)36:1993–2003
DOI10.1007/s00134-010-1991-5REVIEW
individuals (1)who are willing to participate and (2)who have prede?ned inclusion criteria are recruited,may introduce selection biases that make subjects unrepre-sentative of the reference population.
In observational studies,investigators do not control the treatment assignment,and large differences may exist between the two arms,in both observed and non-observed covariates.On the other hand,theses differences are likely to bias the estimation of treatment effect [6].If statistical methods could handle this bias,it would indeed increase the interest in observational studies in several situations.The two major strategies to control for selection bias in observational studies are (1)adjustment of treatment effect,which relies on the relationship between prognostic variables and outcome,and (2)probability-of-treatment models that rely on the relationship between prognostic values and treatment assignment.
The propensity score is de?ned as a subject’s proba-bility of receiving a speci?c treatment conditional on the observed covariates.Rosenbaum and Rubin [7]demon-strated that conditioning on the propensity score allows to obtain unbiased estimation of the treatment effect.
Studies using propensity score methods are increas-ingly reported in the anaesthesiology and intensive care literature,as illustrated by Fig.1.Providing some keys to physicians for how to accurately use,report and critically evaluate studies using propensity score analyses thus seems of particular interest.
The aims of this review were (1)to present didacti-cally the theory of propensity score (2)to assess the use and the quality of reporting of PS studies in the intensive care and anaesthesiology literature,and (3)to evaluate how past reviews have in?uenced the quality of the reporting.
The present work has two parts:(1)in the ?rst one,the theory of the propensity score is summarised,and (2)in the second part,the use and the quality of the reporting on propensity score-based studies published in seven major journals of anaesthesiology and critical care during the last 3years were studied.
Background of the propensity score
Theory of propensity score
Let us assume that each subject has observed covariates X and an indicator of treatment group Z (Z =1if treated and Z =0if control).X is a vector of covariates that might include a large number of characteristics describing that subject.The propensity score e (X )can be de?ned for each subject as the probability that he/she had to receive the treatment,given his/her baseline covariates,had he/she actually received the treatment or not [7].The theory of propensity score relies on three major assumptions:(1)the treatment assignment of a patient is independent from the others (given the X ,the Z are independent),(2)there are no unmeasured confounders (i.e.,all the covariates potentially related to the treatment assignment are known and measured),(3)the treatment assignment is strongly ignorable given the covariates,i.e.the treatment assign-ment and the response are known to be conditionally independent given the covariates.
Under the preceding assumptions,as demonstrated by Rosenbaum and Rubin,a treated and a non-treated patient with the same propensity score can be considered as randomly assigned to each group.In practice,the pro-pensity score is not known,but has to be estimated.The usual methodology uses a logistic model to estimate the probability of being treated given the observed covariates.Analysis using the propensity score
The four most common techniques that may use the propensity score are:(1)matching,(2)strati?cation (also called subclassi?cation),(3)regression adjustment and (4)more recently weighting with the propensity score [8].
Matching (or matched sampling)
Matching is a common technique that is used to couple control and treated subjects on the basis of a similar value of their covariates.Although the idea of ?nding matches seems straightforward,it is often dif?cult to ?nd subjects who are identical or even similar (i.e.that can be mat-ched)on all important covariates,even when there are
only a few background covariates.Propensity
score Fig.1Number of published articles reporting propensity score-based studies in the medical literature from 1998–2009
1994
matching solves this problem by allowing the investigator to control for many background covariates simultaneously by matching on a single scalar variable,namely the pro-pensity score[9–12].
The performance of different propensity score matching methods that are commonly employed in the medical literature have been compared recently[13].The most commonly used and well-performing method con-sists of random ordering of the treated and control subjects,then selecting the?rst treated subject and?nding the control subject with the closest propensity score.To avoid matching subjects with too different propensity scores,a calliper is often used,where matching can only occur with a given range of propensity score values or, more frequently,of the logit of the propensity score(i.e. log(p/1-p),which is the value directly obtained,by solving the logistic regression).
Strati?cation(or subclassi?cation)
Strati?cation of the propensity score consists of estimat-ing treatment effect within strata usually de?ned as quintiles of the propensity score.
Adjustment
As in a classical multivariable model,adjustment of the PS is achieved by including the propensity score as an explicative covariate when modelling the treatment effect. Weighting
The idea of reweighting treated and control subjects by corresponding propensity score to make them more repre-sentative of the population of interest was?rst proposed by Rubin[14]and more recently studied by Lunceford et al.
[8].The weight affected for a treated subject is the inverse of its propensity score,and the weight for a control subject is the inverse of‘‘1minus’’its propensity score.
Matching on the propensity score has now been clearly demonstrated to be the best use of the propensity score in order to attempt to provide unbiased estimation of the treatment effect[13].Of note,there are settings where conventional regression methods are superior to propen-sity score methods,with respect to bias as well as with respect to precision.
Practical settings of matching
When one chooses to match on the propensity score,he or she has to?x basically three features:(1)the balance of the matching,(2)matching with or without replacement and(3)the matching algorithm.
Most of the clinical studies use‘‘one-to-one’’(1:1) matching.In this case,pairs of treated and untreated subjects with a similar propensity score are constituted. Many-to-one and many-to-many matchings are also pos-sible,but are rarely employed in practice.A dif?culty in using many-to-one or many-to-many matching is that diagnostics for assessing the relative balance in baseline covariates are less well developed in this context than in the setting of1:1matching.
In matching without replacement,an untreated subject who has already been matched with a treated subject is no longer available as a potential match for other treated subjects.On the contrary,when matching with replace-ment is used,the same subject can be used several times as a match control for treated patients.Matching with replacement is rarely employed[15].
There are two types of matching algorithms:greedy matching and optimal matching.Greedy matching con-sists of selecting a random treated subject and then matching it to the nearest untreated subject.This untreated subject is selected even if it would better serve as a match for a subsequent treated subject.Once selec-ted,it is no longer available for subsequent matching. With optimal matching,pairs of treated and untreated subjects are formed so as to minimise the total within-pair differences in the propensity score.In this case, previously formed pairs could be unform if another combination could minimise the total within-pair difference.
The two most popular procedures to perform the matching are nearest neighbour matching within?xed calliper widths and5–1digit matching.The?rst one attempts to match each treated subject to the nearest untreated subject within a speci?ed calliper width:the maximum difference in the PS between two matched subjects is prede?ned,generally as a fraction of the standard deviation(SD)of the propensity score on the logit scale.The most frequently used value is0.2SD, even if more stringent criteria such as0.05SD seem to perform slightly better,at the price of a smaller number of treated patients matched[13].Using the second approach,treated subjects are?rst matched to untreated subjects on the?rst?ve digits of the propensity score. For those treated subjects that remain unmatched,mat-ches with untreated subjects are then attempted by matching with the remaining untreated subjects on the ?rst four digits of the propensity score.This process proceeds until unmatched treated subjects are matched to untreated subjects on the?rst digit of the propensity score.Treated and untreated subjects that remain unmatched are then discarded.This method has been used frequently in the medical literature[16–20],despite the fact that its performance has never been rigorously examined.
1995
Assessment of the propensity score matching
Concerning the assessment of the propensity score matching quality itself,it has been shown that there was no association between the area under the ROC curve (c -statistic)or any goodness-of-?t test and the ability of a given propensity score to accurately balance prognosti-cally important variables between treated and untreated subjects in a propensity score matched sample [21].Measure of goodness-of-?t and discrimination also do not provide useful information to detect missing confounders in the propensity score model [22].
When matching on the propensity score is used,the objective is to balance the prognostic factors between groups.The success of matching can be assessed for each variable using standardised differences (d )[23],where d is de?ned as:d ?
100?ex treatment àx control T
??????????????????????s 2treatment ts 2
control
2
q with x treatment ,s treatment 2,x control ,s 2
control the mean and the variance in the treatment group and in the control group,respectively.
A successful balance is inferred if residual imbalance as measured by d is small for all confounders.A value of d
B 10%has been empirically considered as acceptable [23].
The study conducted by Wijeysundera et al.[24]on the effect of epidural analgaesia on mortality after inter-mediate-to-high risk non-cardiac surgery is a good example.The authors produced a table with standardised differences before and after matching.We propose a more visual approach to present these differences using a graphical display of the standardised difference before and after matching (see Fig.2).As was easily noticed,there were many differences before matching that could induce bias in the estimating of the effect of epidural analgesia.For example,in the original sample,the patients receiving epidural analgesia received more arte-rial and central venous lines,and these differences were no longer present in the matched sample,leading to more comparable groups of patients.
Outcome analysis after matching on the propensity score
The model to be used for analysis is primarily driven by the type of outcome (continuous,binary or censored).However,matching on the propensity score may induce correlation into the dataset,which may in?uence the estimation of the variance of treatment effect.Therefore,it is recommended to use robust approaches to compute the variance of estimates [25,26].
Review of the literature
Materials and methods Search strategy
A computerised search of the seven journals with the highest impact factors in intensive care and anaesthesi-ology areas was performed from 1January 2006to 31December 2009for publications in English limited to clinical observational studies using the search terms propensity score OR propensity.Retrieved articles were assessed by one the authors (EG),who screened the titles and abstracts to identify relevant studies.Articles were included only if the study was identi?ed as an observa-tional study using the propensity score method.Letters and articles for which only the abstract was available or reports describing only the design of the trial were excluded.Articles were screened for successive publica-tions of the same study (i.e.the same study described in several articles),and only the one with the more detailed article was selected.
The seven studied journal were:American Journal of Respiratory and Critical Care Medicine (AJRCCM ),Critical Care Medicine (CCM ),Critical Care (CC ),Anesthesiology ,Intensive Care Medicine (ICM ),Anes-thesia and Analgesia (Anesth Analg )and The British Journal of Anaesthesia (BJA ).Evaluation of methodological quality
Two independent reviewers (EG and RP)tested a data extraction form with a distinct set of ?ve articles during a training session.To assess interobserver reliability,the two reviewers independently extracted information from a computer-generated random sample of ten articles.Reviewers were not blinded to the journal and authors.A single reviewer (EG)extracted the following data from all reports:
(1)Characteristics of the report,including the journal name,medical area,number of exposed and control subjects in the original study sample and type of outcome (binary,continuous or censored).
(2)Type of propensity score analysis:matching,strati?cation,adjustment or weighting.
(3)In case of matching,the number of matched pairs,method used,balance,calliper size,replacement or not,imbalance’s assessment or not,and if yes,how (test or percentage of imbalance),and statistical analysis adapted to matched data or not.
(4)In case of strati?cation of the propensity score:number and de?nition of strata.
(5)For all articles,the number of covariates included in the propensity score,names of included covariates (pre-cised or not),reasons for the included covariates mentioned
1996
or not,adjustment of covariates or not,analysis without propensity score given,and if yes,concordant or not,and lastly the software used for the analyses.
Statistical analysis
Descriptive statistics(median,?rst and third quartile) were used for continuous variables.Categorical variables were described with counts and percentages.The degree of agreement between the two reviewers was determined using the kappa statistic(j)for categorical variables and the intra-class correlation coef?cient(ICC)for continuous variables.Analyses were performed using R statistical package(online at https://www.360docs.net/doc/1f5377230.html,, The R Foundation for Statistical Computing,Vienna, Austria).
Results
We retrieved47articles published in the seven most important intensive care and anaesthesiology journals employing propensity score-based methods during the studied period:4from AJRCCM,9from Anesth Analg,9 from Anesthesiology,1from BJA,6from CC,12from CCM and6from ICM[27–74].
1997
Interobserver reproducibility (n =10)
Among the 21categorical items reviewed,the median j was 0.83and varied from 0.43(for ‘‘rough and PS analysis both reported’’)to 1.0(for ‘‘paired analysis reported in case of matching’’).Concerning the six continuous items,the median ICC was 0.99with a range from 0.69(for the caliper size in case of matching)to [0.99(for the sample size).Quality of the reporting (Table 1)
Approximately 26articles reported analyses bound to propensity score matching.No article reported use of weighting on the propensity score.Nine articles used strati?cation of the propensity score and 12adjustment of the propensity score.
A higher number of covariates were included in the propensity score in case of matching,with a higher number of subjects.The description of covariates inclu-ded in the propensity score was missing in 9–28%of cases,depending of the method used.The choice of the variables included in the PS model was discussed only in 46–78%of the studies.
The outcome of interest was a binary endpoint in most of cases,regardless of the method used.
Particularly in case of matching,the proportion of studies,where the number of treated subjects per covar-iate included in the propensity score was less than 10(%of ratio \10in the table)was high (33%).Of note,the cutoff of ‘‘1’’,which included the covariate for ten events,was based on the rule of thumb described by Harrel [75].
Matching on the propensity score (Table 2)
Twenty-six (55%)articles used matching on propensity score methods.Twenty-one articles (81%)reported the method used to form matched pairs.For those where the method was given,nearest neighbour matching was used in 8cases and greedy matching in 13cases.The large majority of studies reported 1:1matching (23;88%);two articles reported 1:2matching.The information about matching with replacement or not was clearly stated in only ?ve articles.The part of the initial sample kept after matching on the propensity score was 32%(24–54%).The authors mentioned performing a paired analysis in only four (15%)articles.
Comparison with the previous literature reviews The Table 3shows that the number of patients in studies using propensity score analyses remained high with a similar number of covariates included in the score,but an increase %of ratio \10.
Matching is increasingly used.The quality of the reporting of these analyses tends to improve when com-pared to former reviews [76–80].Indeed,the method by which matched pairs were created was more often reported in our review,and we observed a trend to an increased use of standardised differences to assess imbalance in place of statistical testing.However,imbalance was still not assessed in 23%of the cases.
Discussion
The objective of this study was to examine whether,after several published reviews,appropriate statistical methods were used for the propensity score analysis in the intensive care and anaesthesiology literature.We
have chosen to focus on the seven most leading journals in terms of the impact factor to present an overview that
Table 1Characteristics of PS development of the three techniques
All articles
Matching Adjustment Strati?cation No.of articles 47(100)
26(55)
12(26)
9(19)
No.of patients
2,186(498–5,612)2,199(544–3,030)1,000(214–1,341)604(413–1,673)Covariates included in the propensity score model Number
15(9–22)18(14–32)10(5–16)12(7–12)No.of treated patients per covariate included in the model 19(10–71)20(10–57)15(5–31)49(14–181)Event/variable ratio \1011(31)6(33)3(33)2(25)Names precise 36(78)18(72)10(91)7(78)Choice discussed 26(55)12(46)6(55)7(78)Type of endpoint Binary 33(70)18(69)9(82)5(56)Censored 11(23)6(23)2(18)3(33)Continuous
3(6)
2(8)
0(0)
1(11)
Results are expressed as count (%),median (interquartile)
The event/variable ratio \10criterion represents the number of articles in which the rule of thumb was not respected,i.e.in which more than one covariate for ten events has been included in the propensity score model.This leads to non-parsimonious models
1998
should represent the most relevant clinical research in that area.
The main information derived from our study was a trend of improvement of the reporting and use of pro-pensity score-based analysis as compared to situations reported in earlier reviews.However,many issues are not yet optimal.In a precedent review,Austin [79]noted that propensity score matching tended to be poorly
documented in the medical literature between 1996and 2003.We noted that,contrary to a growing number of publications using propensity score analyses,the quality of the reporting did not increase as desired.The improvements we noted mainly concern the technique used for assessing covariate imbalance and the reporting of the method used to match patients.The authors rely more frequently on standardised differences,which have been shown to better assess imbalance.
The increasing use of propensity score methods associated with a non-optimal quality of use could be explained by the fact that,along with multiple publica-tions on this subject,propensity scores have become popular in the last years.Concurrently,methods for pro-pensity score analysis have been implemented in many usual statistical softwares,providing the opportunity for all to carry out propensity score analyses.Thus,many authors are more likely to report the name of the soft-ware’s macro than the real name of the algorithm used to obtain matched pairs.Thus,a reasonable explanation may be that the less the investigators are familiar with the theory of the method for propensity score analyses,the more they are likely to rely on the method implemented in the software they use.
Our review has several limitations.First,we chose to focus on seven journals,which may not be representative.However,they are commonly considered as the best journals in the domain,and so the imperfection in the use and reporting of propensity scores is probably underesti-mated.Second,as in most systematic reviews of the literature,it is dif?cult to evaluate exactly which analyses
Table 2Reporting characteristics of articles using PS-matched analyses (n =26)Remaining patients after matching (%)32(24–54)Method
5–1Greedy matching
13(50)Nearest neighbour matching 8(31)Non-detailed 5(19)Balance 1/123(88)1/22(8)1/3
1(4)Replacement Yes 0(0)No
5(19)Non-detailed
21(81)Paired analysis discussed
4(15)
Results are expressed as count (%),median (interquartile)
%of remaining patients after matching refers to the proportion of treated patients for whom a control cannot be found in the control group;these patients are excluded from the analysis.Balance refers to the number of control patients to be matched to one treated patient.Replacement is considered if a control patient was poten-tially matched to more than one treated patient
Table 3Comparison with the results of previous reviews
Weitzen et al.[76]
Shah
et al.[77]Sturmer et al.[78]Austin [79]Austin [80]Present study Period
Year-2001Up to June 2003Up to end of 20031996–20032004–20062006–2009No.of articles
47
43
1774760
47Cardiovascular literature 30(64)19(44)90(51)NA 60(100)–
No.variables
17(8–27)
NA (2–112)
NA NA 15(9–22)(3–54)No.of treated patients 805(182–3,802)NA (61to [1,380,000)NA NA 968(121–1,310)(16–5,698)
No.of patients
in the control group NA NA NA NA NA 1,307(392–4,367)(84–33,749)Event/variable ratio \105(11)NA NA NA NA 11(31)Articles using matching Number
7(15)11(26)51(29)47(100)60(100)26(55)
Evaluation of imbalance Present 7(100)8(73)NA 39(83)51(82)20(77)With test NA NA NA 33(70)47(78)14(54)With STD
NA NA NA
2(4)0(0)6(23)
%Exposed matched NA NA 90(26–100)NA NA 79(69–97)(1–100)Method reported
NA
NA
NA
32(68)
43(72)
21(81)
Results are expressed as count (%),median (interquartile),median (min–max)or (min–max)
Event/variable ratio \10represents the number (and percentage)of studies in which more than 1variable for 10treated patients was included in the propensity score models.%Exposed matched means the ratio of treated patients from the original sample remaining in the matched sample
STD standardised difference,No number
1999
were performed using what is reported in the published articles.Therefore,it is possible that much of the model development and assessment were adequately performed by the researchers,but not explicitly described in their articles.Third,important information about propensity score analysis,including description of the types of vari-ables,variable selection procedures and the speci?c method used,have not been abstracted systematically,because these have rarely been presented with suf?cient detail in published reports.Fourth,this review was limited to seven specialised journals.If some articles reporting the propensity score analysis in anaesthesiology or intensive care study were published in general journals,they were not included in the present review.The comparison to the previous reviews could be biased because of this limita-tion.There is however no reason why a difference should exist in the quality of reporting for this particular area compared to the remaining part of the medical literature.In conclusion,we think several recommendations could be implemented in order to improve the reporting of
propensity score methods,particularly in case of matching on the propensity score,which has been shown to be the most accurate way to use propensity scores [25,81,82].First,it is important that the authors explain how they constituted the propensity score.In particular,the choice of covariates included in the propensity score model have to be discussed,and the authors should provide suf?cient information about how they have identi?ed all the potential confounders and included them.Actually,it has been shown that omitting confounders increases the bias of the treatment effect estimation,even if balance is well obtained for the other covariates.
Secondly,concerning the assessment of the propensity score quality itself in the particular setting of matching on the propensity score,it has been proven that there was no association between the ROC curve area (c -statistic)and the ability of a given propensity score to balance prognostically important variables between treated and untreated subjects [21].Actually,measures of goodness-of-?t and discrimination do not provide information to
Table 4Key points to critically read a propensity score-based study Items
Example using the Ref.[26]Type of cohort Retrospective Type of data Administrative
Type of exposure Epidural anaesthesia or
analgaesia within 1day of surgery
Primary endpoint
All-cause death within 30days after surgery Number of patients In the treated group 56,556In the control group 202,481
Development of the propensity score
Number of covariates included 24
Choice of covariates justi?ed Clinical signi?cance Used model
Non-parsimonious
multivariate logistic model Chosen method (matching/subclassi?cation/
adjustment/weighting)Matching on the propensity score In case of subclassi?cation,number of strata –In case of adjustment,model used
–
Matching procedure Used method
Greedy matching algorithm with a calliper width of 0.2SD of the log odds of the estimated propensity score Balance
1/1
Replacement or not
Without
Evaluation of covariate balance With standardised differences Number of patients per group 44,094/44,094
Outcome analysis Model
Agresti and Min method [83]Paired analysis
Yes
Application to an example taken from Ref.[26]
2000
detect missing confounders in propensity score models [22].Thirdly,the most important goal of the propensity score is to obtain balance among covariates between treatment and control groups,and the balance obtained has to be checked and reported.A table showing a‘‘success-ful’’matching on a set of covariates may convince the reader that the propensity score has been ef?cient in bal-ancing groups and hence yields an unbiased estimation of the treatment effect.An optimal way to assess this balance might be to compute the standardised differences before and after matching on each measured covariate;a graphic representation seems to be a well-suited way to represent the ability of the propensity score to balance the groups. Finally,the authors should report results of the analyses with and without the propensity score in order to appreciate the information carried by the propensity score analysis.
Accordingly,in Table4we propose a check-list ded-icated to the reading of article reporting propensity score-based analyses.As an illustration,it was applied it to the work of Wijeysundera et al.[24]previously cited in this article.In this example,which used matching on the propensity score analysis,the quality of the reporting could be judged as well.More,the analysis seems to respect all the conditions,including paired analysis.Of note,in this example,probably because of the large number of patients,78%of the treated subjects from the original sample remained in the matched sample.
In Fig.3,we also tried to synthesise the different steps of planning and analysis of an observational study using propensity score methods(see Fig.3).Ideally,propensity score methods,in particular matching,should be applied to an observational study designed especially and a priori for this kind of analysis.Thus,a documented statistical analysis plan should be established and decided upon prior to any analysis being conducted.This methodology could lead to measure a maximum number of potential confounders and so to obtain unbiased estimation of treatment effect.
In summary,our study shows that the propensity score is increasingly used in the intensive care and anaesthesi-ology literature(Fig.1).There is however room for improvement in the presentation of these results.
References
1.Ellenberg JH(1994)Selection bias in
observational and experimental studies.
Stat Med13:557–567
2.Rossi P,Freeman H(1993)A
systematic approach.Sage Publications, Inc,Newbury Park
3.Corrie P,Shaw J,Harris R(2003)Rate
limiting factors in recruitment of
patients to clinical trials in cancer
research:descriptive study.BMJ
327:320–321
4.Fossa SD,Skovlund E(2002)Selection
of patients may limit the
generalizability of results from cancer trials.Acta Oncol41:131–137
5.Guyatt GH,Sackett DL,Cook DJ
(1994)Users’guides to the medical
literature.II.How to use an article
about therapy or prevention.B.What
were the results and will they help me in caring for my patients?Evidence-
based medicine working group.Jama
271:59–63
6.Pocock SJ,Elbourne DR(2000)
Randomized trials or observational
tribulations?N Engl J Med
342:1907–1909
7.Rosenbaum P,Rubin D(1983)The
central role of the propensity score in
observational studies for causal effect.
Biometrika70:41–55
8.Lunceford JK,Davidian M(2004)
Strati?cation and weighting via the
propensity score in estimation of causal treatment effects:a comparative study.
Stat Med23:2937–2960
9.Rubin D(1976)Matching methods that
are equal percent bias reducing:some
examples.Biometrics35:417–446
10.Rubin D(1979)Using multivariate
matched sampling and regression
adjustment to control bias in
observational studies.J Am Stat Assoc
74:318–324
11.Rubin D(1980)Bias reduction using
Mahalanobis metric matching.
Biometrics36:293–298
12.Carpenter R(1977)Matching when
covariables are normally distributed.
Biometrika64:299–307
13.Austin PC(2009)Some methods of
propensity-score matching had superior
performance to others:results of an
empirical investigation and Monte
Carlo simulations.Biom J51:171–184
14.Rubin D(2001)Using propensity scores
to help design observational studies.
Health Serv Out Res Method2:169–188
15.Austin PC(2008)Assessing balance in
measured baseline covariates when
using many-to-one matching on the
propensity-score.Pharmacoepidemiol
Drug Saf17:1218–1225
16.Aronow HD,Novaro GM,Lauer MS,
Brennan DM,Lincoff AM,Topol EJ,
Kereiakes DJ,Nissen SE(2003)In-
hospital initiation of lipid-lowering
therapy after coronary intervention as a
predictor of long-term utilization:a
propensity analysis.Arch Intern Med
163:2576–2582
17.Srinivasan AK,Shackcloth MJ,
Grayson AD,Fabri BM(2003)
Preoperative beta-blocker therapy in
coronary artery bypass surgery:a
propensity score analysis of outcomes.
Interact Cardiovasc Thorac Surg
2:495–500
18.Vikram HR,Buenconsejo J,Hasbun R,
Quagliarello VJ(2003)Impact of valve
surgery on6-month mortality in adults
with complicated,left-sided native
valve endocarditis:a propensity
analysis.JAMA290:3207–3214
19.Elad Y,French WJ,Shavelle DM,
Parsons LS,Sada MJ,Every NR(2002)
Primary angioplasty and selection bias
inpatients presenting late([12h)after
onset of chest pain and ST elevation
myocardial infarction.J Am Coll
Cardiol39:826–833
20.Sabik JF,Gillinov AM,Blackstone EH,
Vacha C,Houghtaling PL,Navia J,
Smedira NG,McCarthy PM,Cosgrove
DM,Lytle BW(2002)Does off-pump
coronary surgery reduce morbidity and
mortality?J Thorac Cardiovasc Surg
124:698–707
21.Austin PC,Grootendorst P,Anderson
GM(2007)A comparison of the ability
of different propensity score models to
balance measured variables between
treated and untreated subjects:a Monte
Carlo study.Stat Med26:734–753
2001
22.Weitzen S,Lapane KL,Toledano AY,
Hume AL,Mor V(2005)Weaknesses
of goodness-of-?t tests for evaluating
propensity score models:the case of the omitted confounder.
Pharmacoepidemiol Drug Saf
14:227–238
23.Austin PC(2009)Balance diagnostics
for comparing the distribution of
baseline covariates between treatment
groups in propensity-score matched
samples.Stat Med28:3083–3107 24.Wijeysundera DN,Beattie WS,Austin
PC,Hux JE,Laupacis A(2008)
Epidural anaesthesia and survival after
intermediate-to-high risk non-cardiac
surgery:a population-based cohort
https://www.360docs.net/doc/1f5377230.html,ncet372:562–569
25.Austin PC(2007)The performance of
different propensity score methods for
estimating marginal odds ratios.Stat
Med26:3078–3094
26.Austin PC(2008)The performance of
different propensity-score methods for
estimating relative risks.J Clin
Epidemiol61:537–545
27.Almog Y,Novack V,Eisinger M,
Porath A,Novack L,Gilutz H(2007)
The effect of statin therapy on
infection-related mortality in patients
with atherosclerotic diseases.Crit Care
Med35:372–378
28.Annane D,Sebille V,Duboc D,Le
Heuzey JY,Sadoul N,Bouvier E,
Bellissant E(2008)Incidence and
prognosis of sustained arrhythmias in
critically ill patients.Am J Respir Crit
Care Med178:20–25
29.Biki B,Mascha E,Moriarty DC,
Fitzpatrick JM,Sessler DI,Buggy DJ
(2008)Anesthetic technique for radical prostatectomy surgery affects cancer
recurrence:a retrospective analysis.
Anesthesiology109:180–187
30.Clec’h C,Alberti C,Vincent F,
Garrouste-Orgeas M,de Lassence A,
Toledano D,Azoulay E,Adrie C,
Jamali S,Zaccaria I,Cohen Y,Timsit
JF(2007)Tracheostomy does not
improve the outcome of patients
requiring prolonged mechanical
ventilation:a propensity analysis.Crit
Care Med35:132–138
https://www.360docs.net/doc/1f5377230.html,bes A,Luyt CE,Nieszkowska A,
Trouillet JL,Gibert C,Chastre J(2007) Is tracheostomy associated with better
outcomes for patients requiring long-
term mechanical ventilation?Crit Care
Med35:802–807
32.de Boer MT,Christensen MC,
Asmussen M,van der Hilst CS,
Hendriks HG,Slooff MJ,Porte RJ
(2008)The impact of intraoperative
transfusion of platelets and red blood
cells on survival after liver
transplantation.Anesth Analg
106:32–44(table of contents)33.Dhainaut JF,Payet S,Vallet B,Franca
LR,Annane D,Bollaert PE,Le Tulzo
Y,Runge I,Malledant Y,Guidet B,Le
Lay K,Launois R(2007)Cost-
effectiveness of activated protein C in
real-life clinical practice.Crit Care
11:R99
34.Duncan AI,Koch CG,Xu M,Manlapaz
M,Batdorf B,Pitas G,Starr N(2007)
Recent metformin ingestion does not
increase in-hospital morbidity or
mortality after cardiac surgery.Anesth
Analg104:42–50
35.Duncan AI,Lin J,Koch CG,Gillinov
AM,Xu M,Starr NJ(2006)The impact
of gender on in-hospital mortality and
morbidity after isolated aortic valve
replacement.Anesth Analg
103:800–808
36.Ender J,Borger MA,Scholz M,Funkat
AK,Anwar N,Sommer M,Mohr FW,
Fassl J(2008)Cardiac surgery fast-
track treatment in a postanesthetic care
unit:six-month results of the Leipzig
fast-track concept.Anesthesiology
109:61–66
37.Eurich DT,Marrie TJ,Johnstone J,
Majumdar SR(2008)Mortality
reduction with in?uenza vaccine in
patients with pneumonia outside‘‘?u’’
season:pleiotropic bene?ts or residual
confounding?Am J Respir Crit Care
Med178:527–533
38.Fellahi JL,Parienti JJ,Hanouz JL,
Plaud B,Riou B,Ouattara A(2008)
Perioperative use of dobutamine in
cardiac surgery and adverse cardiac
outcome:propensity-adjusted analyses.
Anesthesiology108:979–987
39.Grathwohl KW,Black IH,Spinella PC,
Sweeney J,Robalino J,Helminiak J,
Grimes J,Gullick R,Wade CE(2008)
Total intravenous anesthesia including
ketamine versus volatile gas anesthesia
for combat-related operative traumatic
brain injury.Anesthesiology109:44–53
40.Griesdale DE,Bosma TL,Kurth T,Isac
G,Chittock DR(2008)Complications
of endotracheal intubation in the
critically ill.Intensive Care Med
34:1835–1842
41.Hix JK,Thakar CV,Katz EM,Yared
JP,Sabik J,Paganini EP(2006)Effect
of off-pump coronary artery bypass
graft surgery on postoperative acute
kidney injury and mortality.Crit Care
Med34:2979–2983
42.Honiden S,Schultz A,Im SA,Nierman
DM,Gong MN(2008)Early versus late
intravenous insulin administration in
critically ill patients.Intensive Care
Med34:881–887
43.Kertai MD,Westerhout CM,Varga KS,
Acsady G,Gal J(2008)Dihydropiridine
calcium-channel blockers and
perioperative mortality in aortic
aneurysm surgery.Br J Anaesth
101:458–465
44.Le Manach Y,Godet G,Coriat P,
Martinon C,Bertrand M,Fleron MH,
Riou B(2007)The impact of
postoperative discontinuation or
continuation of chronic statin therapy
on cardiac outcome after major vascular
surgery.Anesth Analg104:1326–1333
(table of contents)
45.Ranucci M,Isgro G(2007)Minimally
invasive cardiopulmonary bypass:does
it really change the outcome?Crit Care
11:R45
46.Rowan KM,Welch CA,North E,
Harrison DA(2008)Drotrecogin alfa
(activated):real-life use and outcomes
for the UK.Crit Care12:R58
47.Scales DC,Thiruchelvam D,Kiss A,
Redelmeier DA(2008)The effect of
tracheostomy timing during critical
illness on long-term survival.Crit Care
Med36:2547–2557
48.Schortgen F,Girou E,Deye N,
Brochard L(2008)The risk associated
with hyperoncotic colloids in patients
with shock.Intensive Care Med
34:2157–2168
49.Thombs BD,Bresnick MG(2008)
Mortality risk and length of stay
associated with self-in?icted burn
injury:evidence from a national sample
of30,382adult patients.Crit Care Med
36:118–125
50.Tritapepe L,De Santis V,Vitale D,
Nencini C,Pellegrini F,Landoni G,
Toscano F,Miraldi F,Pietropaoli P
(2007)Recombinant activated factor
VII for refractory bleeding after acute
aortic dissection surgery:a propensity
score analysis.Crit Care Med
35:1685–1690
51.Vandijck DM,Benoit DD(2008)
Impact of recent intravenous
chemotherapy on outcome in severe
sepsis and septic shock patients with
haematological malignancies:reply to
letter by Meyer et al.Intensive Care
Med34:1930–1931
52.Vincent JL,Sakr Y,Sprung C,Harboe
S,Damas P(2008)Are blood
transfusions associated with greater
mortality rates?Results of the sepsis
occurrence in acutely Ill patients study.
Anesthesiology108:31–39
53.Berger MM,Soguel L,Shenkin A,
Revelly JP,Pinget C,Baines M,
Chiolero RL(2008)In?uence of early
antioxidant supplements on clinical
evolution and organ function in
critically ill cardiac surgery,major
trauma,and subarachnoid hemorrhage
patients.Crit Care12:R101
54.Constantinides VA,Tekkis PP,Fazil A,
Kaur K,Leonard R,Platt M,Casula R,
Stanbridge R,Darzi A,Athanasiou T
(2006)Fast-track failure after cardiac
surgery:development of a prediction
model.Crit Care Med34:2875–2882
2002
55.Meier R,Bechir M,Ludwig S,
Sommerfeld J,Keel M,Steiger P,
Stocker R,Stover JF(2008)Differential temporal pro?le of lowered blood
glucose levels(3.5to6.5mmol/l versus 5to8mmol/l)in patients with severe
traumatic brain injury.Crit Care12:R98 56.Bagshaw SM,Lapinsky S,Dial S,Arabi
Y,Dodek P,Wood G,Ellis P,Guzman J,Marshall J,Parrillo JE,Skrobik Y,
Kumar A(2009)Acute kidney injury in septic shock:clinical outcomes and
impact of duration of hypotension prior to initiation of antimicrobial therapy.
Intensive Care Med35:871–881
57.Kor DJ,Iscimen R,Yilmaz M,Brown
MJ,Brown DR,Gajic O(2009)Statin
administration did not in?uence the
progression of lung injury or associated organ failures in a cohort of patients
with acute lung injury.Intensive Care
Med35:1039–1046
58.Zarychanski R,Doucette S,Fergusson
D,Roberts D,Houston DS,Sharma S,
Gulati H,Kumar A(2008)Early
intravenous unfractionated heparin and
mortality in septic shock.Crit Care Med 36:2973–2979
59.Beattie WS,Karkouti K,Wijeysundera
DN,Tait G(2009)Risk associated with preoperative anemia in noncardiac
surgery:a single-center cohort study.
Anesthesiology110:574–581
60.Beattie WS,Wijeysundera DN,
Karkouti K,McCluskey S,Tait G,
Mitsakakis N,Hare GM(2009)Acute
surgical anemia in?uences the
cardioprotective effects of beta-
blockade:a single-center,propensity-
matched cohort study.Anesthesiology
112:25–33
61.Christensen S,Thomsen RW,Johansen
MB,Pedersen L,Jensen R,Larsen KM, Larsson A,Tonnesen E,Sorensen HT
(2009)Preadmission statin use and one-year mortality among patients in
intensive care—a cohort study.Crit
Care14:R29
62.Devasia RA,Blackman A,Gebretsadik
T,Grif?n M,Shintani A,May C,Smith T,Hooper N,Maruri F,Warkentin J,
Mitchel E,Sterling TR(2009)
Fluoroquinolone resistance in
Mycobacterium tuberculosis:the effect
of duration and timing of
?uoroquinolone exposure.Am J Respir Crit Care Med180:365–370
63.Karkouti K,Wijeysundera DN,Yau
TM,McCluskey SA,Tait G,Beattie
WS(2009)The risk-bene?t pro?le of
aprotinin versus tranexamic acid in
cardiac surgery.Anesth Analg
110:21–2964.Kerger KH,Mascha E,Steinbrecher B,
Frietsch T,Radke OC,Stoecklein K,
Frenkel C,Fritz G,Danner K,Turan A,
Apfel CC(2009)Routine use of
nasogastric tubes does not reduce
postoperative nausea and vomiting.
Anesth Analg109:768–773
65.Leslie K,Myles PS,Forbes A,Chan
MT(2009)The effect of bispectral
index monitoring on long-term survival
in the B-aware trial.Anesth Analg
110:816–822
66.Lindenauer PK,Rothberg MB,
Nathanson BH,Pekow PS,Steingrub JS
(2010)Activated protein C and hospital
mortality in septic shock:a propensity-
matched analysis.Crit Care Med
38:1101–1107
67.Manrique A,Jooste EH,Kuch BA,
Lichtenstein SE,Morell V,Munoz R,
Ellis D,Davis PJ(2009)The
association of renal dysfunction and the
use of aprotinin in patients undergoing
congenital cardiac surgery requiring
cardiopulmonary bypass.Anesth Analg
109:45–52
68.Martin G,Brunkhorst FM,Janes JM,
Reinhart K,Sundin DP,Garnett K,
Beale R(2009)The international
PROGRESS registry of patients with
severe sepsis:drotrecogin alfa
(activated)use and patient outcomes.
Crit Care13:R103
69.Payen JF,Bosson JL,Chanques G,
Mantz J,Labarere J(2009)Pain
assessment is associated with decreased
duration of mechanical ventilation in
the intensive care unit:a post Hoc
analysis of the DOLOREA study.
Anesthesiology111:1308–1316
70.Renaud B,Santin A,Coma E,Camus
N,Van Pelt D,Hayon J,Gurgui M,
Roupie E,Herve J,Fine MJ,Brun-
Buisson C,Labarere J(2009)
Association between timing of intensive
care unit admission and outcomes for
emergency department patients with
community-acquired pneumonia.Crit
Care Med37:2867–2874
71.Rioux JP,Lessard M,De Bortoli B,Roy
P,Albert M,Verdant C,Madore F,
Troyanov S(2009)Pentastarch10%
(250kDa/0.45)is an independent risk
factor of acute kidney injury following
cardiac surgery.Crit Care Med
37:1293–1298
72.Surgenor SD,Kramer RS,Olmstead
EM,Ross CS,Sellke FW,Likosky DS,
Marrin CA,Helm RE Jr,Leavitt BJ,
Morton JR,Charlesworth DC,Clough
RA,Hernandez F,Frumiento C,Benak
A,DioData C,O’Connor GT(2009)
The association of perioperative red
blood cell transfusions and decreased
long-term survival after cardiac
surgery.Anesth Analg108:1741–1746
73.van Klei WA,Bryson GL,Yang H,
Forster AJ(2009)Effect of beta-blocker
prescription on the incidence of
postoperative myocardial infarction
after hip and knee arthroplasty.
Anesthesiology111:717–724
74.Wisnivesky JP,Halm E,Bonomi M,
Powell C,Bagiella E(2009)
Effectiveness of radiation therapy for
elderly patients with unresected stage I
and II non-small cell lung cancer.Am J
Respir Crit Care Med181:264–269
75.Harrel FE(2001)Over?tting and limits
on number of predictors.In:Regression
modeling strategies.Series in Statistics.
Springer(ed),pp60–61
76.Weitzen S,Lapane KL,Toledano AY,
Hume AL,Mor V(2004)Principles for
modeling propensity scores in medical
research:a systematic literature review.
Pharmacoepidemiol Drug Saf
13:841–853
77.Shah BR,Laupacis A,Hux JE,Austin
PC(2005)Propensity score methods
gave similar results to traditional
regression modeling in observational
studies:a systematic review.J Clin
Epidemiol58:550–559
78.Sturmer T,Joshi M,Glynn RJ,Avorn J,
Rothman KJ,Schneeweiss S(2006)A
review of the application of propensity
score methods yielded increasing use,
advantages in speci?c settings,but not
substantially different estimates
compared with conventional
multivariable methods.J Clin
Epidemiol59:437–447
79.Austin PC(2008)A critical appraisal of
propensity-score matching in the
medical literature between1996and
2003.Stat Med27:2037–2049
80.Austin PC(2007)Propensity-score
matching in the cardiovascular surgery
literature from2004to2006:a
systematic review and suggestions for
improvement.J Thorac Cardiovasc
Surg134:1128–1135
81.Austin PC(2008)The performance of
different propensity score methods for
estimating marginal odds ratios,
Statistics in Medicine2007;
26:3078–3094.Stat Med27:3918–3920
82.Austin PC,Grootendorst P,Normand
SL,Anderson GM(2007)Conditioning
on the propensity score can result in
biased estimation of common measures
of treatment effect:a Monte Carlo
study.Stat Med26:754–768
83.Agresti A,Min Y(2005)Simple
improved con?dence intervals for
comparing matched proportions.Stat
Med24:729–740
2003