Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

College of Health Professions Evidence Based Practice Portal - 2022 draft revision

Your librarian

Image is hyperlinked to send an email to Aaron Bowen

EBM Calculators

Tools to help you interpret the clinical and statistical significance of data reported in clinical research.

How to Read a Paper



West Chester University. (2019, Aug. 20).WCU Nutrition Students: Masters Nutrition Student
Interns. Retrieved from
Used under the Creative Commons License.

This page covers the third of the five As -- appraising the evidence you have acquired through your database searches.  Critical appraisal is a systematic analysis of a research article to determine the quality of the evidence it presents in reference to your clinical question. In this stage of the EBP process, your task is to analyze the different aspects (below) of each study to answer the following questions:

  • What were the strengths of the study and the overall rigor of the study in terms of credibility?
  • How are the results of this study relevant to your clinical question?
  • How might the results influence clinical practice?

Quality of evidence is based on its level, as well as the strength of the study, and how directly it applies to your clinical question.  This tab presents multiple aspects of articles that you can review in order to make judgements about the level, strength, and applicability of the evidence they present in relation to your
clinical question.

Study Design

  • Qualitative study design examples: phenomenology, ethnography, grounded theory, participatory action research.
  • Quantitative study design examples: randomized (RCT), cohort, single-case, before and after, case-control, cross-sectional or case study

Setting & participants

  • Was the setting appropriate to answer the research question?
  • How were participants recruited?
  • What were the inclusion/exclusion criteria?
  • How many participants participated? How many were lost through attrition?
  • What were the participant demographics?
  • If more than one group, was there similarity between groups?

Intervention(s) & Outcome measure(s)

  • Was the intervention clearly described?
  • Who delivered the intervention and how often?
  • Was there cross-contamination between interventions?
  • Was there a break-in period?
Outcome measure(s):
  • What instruments or methods were used to measure the variables? Examples include participant observation, interviews, focus groups, instruments, devices & questionnaires.
  • Did the authors use measures with documented evidence of validity and reliability?
  • Was the procedural reliability documented?
  • How frequently were the participants measured?

Results & conclusions

Main results or key findings:
  • What were the results?
  • Was there statistical difference? What was the effect size?
  • How were the results analyzed and were the analysis methods appropriate?
Authors' interpretation/conclusion:
  • What was the clinical relevance of the study and the results?

Text on this page adapted from the UW Libraries, and is licensed under a CC BY-NC 4.0 license. See details.CC BY-NC 4.0

This evidence pyramid was designed at Tufts University and has been reproduced
by Wynona State University. It is used under Creative Commons BY-NC-SA license 4.0

Quality of evidence is based in part upon its level.  There exist multiple scales to represent an article's level of evidence, such as those produced by the Cochrane Library, as well as multiple graphic representations of this evidence such as the one to the left.  That said, no scale or graphic of this nature represents an infallible, authoritative articulation of what places a study at a specific level.  Rather, different types of research questions are best answered by different types of study (see the Strength of evidence tab to the right), and one type of study may represent a high level of evidence for a therapy-related question, but a lover level of evidence for a prognosis-related question.  As noted on the New York Medical College EBM Resource Center's page on appraisal:

In a best-case scenario, you will be able to find a high-quality pre-appraised information source to answer your question. However, this is not available for all topics. Furthermore, you may also find it useful to appraise pre-appraised information. Once you figure out what to look for and where to look, you still have to worry about the quality of the material you find. You should always question the quality of the material you find. Remember, a poorly done systematic review is not better than a well done randomized controlled trial.

With that context, the following table from Winona State University offers a general presentation of an article's level of evidence and suggest study designs best suited to answer each type of clinical question:

Level of evidence (LOE)


Level I

Evidence from a systematic review or meta-analysis of all relevant RCTs (randomized controlled trial) or evidence-based clinical practice guidelines based on systematic reviews of RCTs or three or more RCTs of good quality that have similar results.

Level II

Evidence obtained from at least one well-designed RCT (e.g. large multi-site RCT).

Level III

Evidence obtained from well-designed controlled trials without randomization (i.e. quasi-experimental).

Level IV

Evidence from well-designed case-control or cohort studies.

Level V

Evidence from systematic reviews of descriptive and qualitative studies (meta-synthesis).

Level VI

Evidence from a single descriptive or qualitative study.

Level VII

Evidence from the opinion of authorities and/or reports of expert committees.

That said, multiple schematics presenting different levels of evidence exist.  The Johanna Briggs Institute, for example, splits its levels of evidence up by effectiveness, diagnosis, prognosis, economic considerations, and meaningfulness.  The Cochrane Library uses a four-point scale of high, moderate, low, and very low certainty that a piece of evidence supports a particular outcome.  

This evidence pyramid was designed at Tufts University and has been reproduced
by Wynona State University. It is used under Creative Commons BY-NC-SA license 4.0

Regarding the strength of a study, various scales, such as the PEDro scale, have been developed to rank studies by their strength.  Strength of evidence is based on any research design limitations, methodological limitations, and/or threats to validity that may affect interpretation of findings and generalization of results.

Research design

How strong the evidence is in support of a given intervention as a means of addressing a clinical question depends in large part upon the type of study being conducted.  For this reason, the pyramid to the right splits off into different domains, and presents different types of design as providing stronger or weaker evidence according to the study design.

Following from the above context, the following table from Winona State University offer a general guide to an article's strength of evidence in relation to different study designs:

Clinical Question

Suggested Research Design(s)

All Clinical Questions

Systematic review, meta-analysis


Randomized controlled trial (RCT), meta-analysis 
Also: cohort study, case-control study, case series


Randomized controlled trial (RCT), meta-analysis, cohort study 
Also: case-control study, case series


Randomized controlled trial (RCT) 
Also: cohort study


Randomized controlled trial (RCT), meta-analysis 
Also: prospective study, cohort study, case-control study, case series


Cohort study
Also: case-control study, case series


Qualitative study

Quality Improvement

Randomized controlled trial (RCT) 
Also: qualitative study 


Economic evaluation

In addition, partly bleeding into the Apply step of the EBP process, Dartmouth offers the following set of appraisal checklists.


University of Pennsylvania. (2006, Dec. 3).Vitale Digital Media Lab
Retrieved from
Used under the Creative Commons License.


While the additional background and context she provides is important to understand, Greenlaigh (1997 -- article begins on page 5 of the document) boils assessing an EBP article's methodology down to five points to consider:

  • Is the study original?
  • Which population or populations does the study describe?
  • Does the study design address the question the study seeks to answer?
  • Is there any systematic bias detectable in the article?  If so, did the authors note it in a section on study limitations?
  • Is the study big enough and long-lasting enough to give the results credibility?

Validity of evidence

The following is a general list of questions to ask regarding the internal and external validity of a study:

  • Were interventions delivered and data collected systematically, objectively and with fidelity?
  • What were the potential threats to internal validity?
    • Examples of potential threats to internal validity: history, maturation, testing, instrumentation, statistical regression, selection, mortality, interactions with selection, ambiguity about causal influence, and diffusion of intervention.
  • What were the potential threats to external validity?
    • Examples of potential threats to external validity: interaction of testing and treatment, interaction of selection and treatment, interaction of setting and treatment, interaction of history and treatment, multiple-treatment interference.
    • Description of external validity

Greenlaigh, T. (1997). Assessing the methodological quality of published papers. BMJ 315: 305-8.
Text in the validity of evidence section adapted from the UW Libraries, and is licensed under a CC BY-NC 4.0 license.
 See details.CC BY-NC 4.0


Asim Bharwani. (2011, Feb. 17). Giggles.
Retrieved from
Used under the Creative Commons License.

The concept of applicability is presented under multiple terms, ranging from relevance, generalizability, and external validity.  Just as it is encompassed by multiple terms, it also poses multiple definitions.  Shadish, Cook, and Campbell (2002) define it as, "inferences about the extent to which a causal relationship holds over variations in persons, settings, treatments and outcomes."  Atkins, Chang, Gartlehner, Buckley, Whitlock, Berliner, and Matchar (2010) define it as, "the extent to which the effects observed in published studies are likely to reflect the expected results when a specific intervention is applied to the population of interest under 'real-world' conditions."  Broadly, applicability addresses the question of how relevant a study or synthesis of studies is to a patient's situation.

The body of literature specific to discussions of applicability in evidence based practice is slimmer than bodies discussing other aspects of appraisal.  While he is writing about a specific piece of software designed to aid in the EBP process, Pearson (2004) notes a general list of questions to pose regarding the applicability of an intervention to a patient's situation:

  • Is it available?
  • Is it affordable?
  • Is it applicable in the setting?
  • Would the patient/client be a willing participant in the implementation of the intervention?
  • Were the patients in the study/studies provided the evidence sufficiently similar to your own to justify the implementation of this particular intervention?
  • What will be the potential benefits to the patient?
  • What is the potential harm to the patient?
  • Does this intervention allow for the individual patient's values and preferences?

Despite the existence of questions like these, Atkins et. al. (2010) note a lack of standards or guidance in assessing applicability.  Rather, applicability has historically been something of a judgement call.  Seeking to rectify this situation, Atkins et. al. (2010) set as their goal to "describe a systematic but practical approach for considering applicability in the process of reviewing, reporting, and synthesizing evidence from eligible studies."  Though they are writing about systematic reviews specifically, their guidance provers applicable (pun intended) to the evaluation of the applicability of acquired studies to a patient's situation.

With the caveat that, "applicability depends on context and cannot be assessed with a universal rating system," they draw from a number of existing models (listed in the further reading section below) to describe a four-step general system for assessing applicability.  While the fourth step is specific to systematic reviews, the first three steps of that process are relevant to evidence based practice:

  1. Determine the most important factors that may affect applicability -- to this end, they note the need for a robustly defined PICO(TTS) question.  From a systematic review perspective, they also robust inclusion and exclusion criteria as being useful for assessing applicability.  They include an extensive table to aid in making this judgement.
  2. Systematically abstract and report key characteristics that may affect applicability in evidence tables -- review the literature you have acquired against the factors affecting applicability that you identified in step 1, and create a table noting how each study proves applicable or not applicable.  The sample table from step 1 should be adaptable to this purpose.
  3. Make and report judgements about major limitations to applicability of individual studies -- having followed the process they have laid out this far, your judgements regarding the applicability of the evidence you have acquired will be systematic, thus reducing the risk of opinion-based judgements regarding applicability.

Works cited:
-- Atkins, D., Chang, S., Gartlehner, G., Buckley, D.I., Whitlock, E.P., Berliner, E., and Matchar, D. (2010).
Assessing the applicability of studies when comparing medical interventions. In Methods guide for comparative effectiveness reviewsAHRQ Publication No. 10(14)-EHC063-EF. Agency for Healthcare Research and Quality, U.S. Department of Health and Human Services.
-- Pearson, A. (2004). 
Balancing the evidence: Incorporating the synthesis of qualitative data into systematic reviews. JBI Reports 2004(2): 45-64.

-- Shadish, W., Cook, T., and Campbell, D. (2002). Experimental and quasi-experimental designs for generalized causal inference. Houghton Mifflin.

Further reading:
-- Bornhöft G., Maxion-Bergemann S., Wolf U., Kienle G.S., Michalsen A., Vollmar H.C., Gilbertson S., and Matthiessen P.F. (2006). Checklist for the qualitative evaluation of clinical studies with particular focus on external validity and model validity. BMC Med Res Methodol(6): 56. doi: 10.1186/1471-2288-6-56.
-- Green L.W., and Glasgow R.E. (2006). Evaluating the relevance, generalization, and applicability of research: Issues in external validation and translation methodology. Evaluation & the Health Professions, 29(1):126-153. doi:10.1177/0163278705284445.
-- Pibouleau L., Boutron I., Reeves B.C., Nizard R., and Ravaud P. (2009). Applicability and generalisability of published results of randomised controlled trials and non-randomised studies evaluating four orthopaedic procedures: Methodological systematic review. BMJ: 339:b4538. doi: 10.1136/bmj.b4538.
-- Rothwell, P.M. (2005). External validity of randomised controlled trials: "to whom do the results of this trial apply?" Lancet, 365(9453): 82-93.

Creative Commons license logo

This page is licensed under a Creative Commons license.

Students with tablets

Lisa Barker. (2019, Aug. 20).Health Sciences and Medicine Students.
Retrieved from
Used under the Creative Commons License.

Synthesis involves combining ideas or results from two or more sources in a meaningful way. In EBP, the synthesis is focused on the clinical question. You may combine the details from the article appraisals into themes to organize the ideas. The writing must remain objective and accurately report the information from the original sources.

  • Strength of evidence is based on the quantity, quality and consistency (of results) of a body of literature.
    • Quantity: The number of studies available
    • Quality: The level and strength of evidence available
    • Consistency of results: The consistency of the research findings
  • Applicability of evidence: Determined by ability for evidence to answer questions

Discuss implications for practice, education, or research. The discussion may include suggestions or recommendations for changes to practice, education or research as well as confirmation of current practice. A table may be used to display the information collected from the articles under discussion.


Article 1

[1st Author and Year]

Article 2

[1st Author and Year]

Article 3

[1st Author and Year]

Article 4

[1st Author and Year]











Outcome Measure(s)





Study Design










Key Findings





Critical appraisal





Study Quality





Text on this page adapted from the UW Libraries, and is licensed under a CC BY-NC 4.0 license. See details.CC BY-NC 4.0

Wichita State University Libraries, 1845 Fairmount, Wichita, KS 67260-0068 | Phone: (316) 978-3481 | Comments/Suggestions | Facebook Instagram Twitter YouTube Channel Federal Depository Library Program HathiTrust Digital Library Patent and Trademark Resources Centers