U.S. flag

An official website of the United States government

The .gov means it's official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you're on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • Browse Titles

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2024 Jan-.

Cover of StatPearls

StatPearls [Internet].

Common pitfalls in the research process.

Jacob Shreffler ; Martin R. Huecker .

Affiliations

Last Update: March 6, 2023 .

  • Definition/Introduction

Conducting research from planning to publication can be a very rewarding process. However, multiple preventable setbacks can occur within each stage of research. While these inefficiencies are an inevitable part of the research process, understanding common pitfalls can limit those hindrances. Many issues can present themselves throughout the research process. It has been said about academics that “the politics are so harsh because the stakes are so low.” Beyond interpersonal and political / funding concerns, prospective authors may encounter some disenchantment with the publish or perish culture. With a metric of (any) publication, the motivation to contribute meaningfully to science can be overshadowed by a compulsive drive to publish. [1]  We believe in quality over quantity and highlight the importance of channeling creativity when pursuing scholarly work.

When considering embarking on a medical research project, one must begin with detailed planning. Do not underestimate the amount of time a project can take, often spanning years from conception to manuscript preparation. Will you conduct a retrospective chart review, a prospective study, or a true clinical trial with randomization and blinding? Will you systematically seek out and remove sources of bias from the study design and interpretation of results? Will you ensure the study is powered properly to justify conclusions? Will you eliminate or explain any conflicts of interest occurring among your author group? Will you fall victim to the temptation of frivolous subgroup analyses, or will you stick with the original plan? Will your study have a realistic chance at publication in a journal within your specialty, or perhaps another subfield? The study results may prove the null hypothesis, a ‘negative study,’ and therefore be difficult to publish. [2]  Additionally, the intervention you find beneficial may subsequently be proven unhelpful or even dangerous, leading to prudent medical reversal. [3]

These considerations and more necessitate meticulous planning and vigilant adherence to a sound protocol. Along the way, you will encounter obstacles, pitfalls, some of which are presented in this article. But remain persistent, and your efforts will be rewarded with publication and contribution to science. This review covers common pitfalls researchers encounter and suggested strategies to avoid them.

  • Issues of Concern

There are five phases of research: planning phase, data collection/analysis phase, writing phase, journal submission phase, and rejections/revisions/acceptance phase.

Phase I Pitfalls: Planning a Study

The highest yield preempting of pitfalls in the research process occurs in the planning phase. This is when a researcher can set the stage for an optimal research process. Below are pitfalls that can occur during the planning phase.

Pitfall: Underestimating what committing to a research project requires

Conducting a research study and achieving publication sounds fulfilling, right?

Consider the many steps: conducting a literature search, writing an IRB proposal, planning and having research meetings, long and cumbersome data collection processes, working with statisticians or analyzing complex data, having unexpected research setbacks (e.g., subjects drop out, newly published papers on same topic, etc.), the possibility that after data collection you have no statistically (or clinically) significant findings, conducting an updated literature search, writing introduction, methods, results, and discussion sections of a paper, going through the many journal options to determine best fit while aiming for high impact factors, adhering to journal guidelines/fixing drafts, writing cover letters stating importance of the topic to respective journals, creating journal portal accounts, possibly being rejected numerous times, waiting months for journal decisions, working on numerous revisions and being informed by numerous individuals about all of the flaws in your writing and research.

Does it sound, maybe less fulfilling ?

Conducting a research project from inception to publication can be a rewarding experience. Research requires significant time. Setbacks are normal. To produce an important and sought-after research product, an individual must understand the magnitude of commitment required.

Pitfall: Choosing the wrong research pursuit/topic lacks precision

Consider an investigator interested in substance use research. The first challenge is the immense amount of research already published on this topic. Fortunately, there is still a massive amount of uncharted territory in substance use research.

It is important to understand what has been done and what is still undiscovered in your area of research. Do not simply study a topic because you find it interesting; passion is advantageous, but you should ensure that your study will contribute to some field/specialty or research in a significant way.

How does your research differ from what has been done?

How will it impact practice in a way that no previous study has?

Consider these questions when choosing a topic for research. Otherwise, you may struggle to get the work published. It can be demoralizing if you have already written your paper and realize that your paper is not going to get accepted by a reputable journal due to the presence of other papers already describing the same concepts you have.

As always, the first step is a thorough literature search.

Pitfall: Not considering research bias

A common theme noted in literature is that bias can, unfortunately, lead to failure to reproduce results, raising concerns regarding the integrity of science. [4]  Bias can be considered various (inadvertent) poor strategies related to data design, analysis, and results reporting that produce spurious results and papers that perhaps should not be published. [5]

While one cannot completely eliminate bias from the research process, researchers should take steps to understand research bias in study endeavors and determine how to minimize bias during the planning phase of the study.  

Pitfall: Not focusing on which variables to collect

Researchers often want to collect as much data as possible but should not build a list of variables that includes every single detail about subjects if the variables collected are unlikely to yield insight into the topic of research. The longer the data collection instrument, the higher likelihood of (human) errors (if manually data entry) and the longer duration of the data collection phase. Instead of taking time to build a database with many variables, consider cutting irrelevant variables and use that time to increase the sample size. Determine, based on your own clinical knowledge and published empirical works, which variables are most crucial. 

Pitfall: Worrying about the statistics after the data has been collected

A vital part of the research process is ensuring you have a rigorous statistical approach. Involve your statistician very early in the project, preferably in the planning stages. They will have insight into the types of variables to collect and help shape the research methods. Statistical power is an important concept to consider before data collection to avoid false-negative results (Zlowodzki et al., 2006). Furthermore, other concepts, such as covariates, need to be part of the planning phase. Do not wait until after the data collection phase to give data to the statistician who cannot transform the data you have into outputs you want.

Pitfall: Not setting defined author roles

It is important to define who will be declared authors at the beginning of the research process to avoid conflict. Do most people want to be an author? Sure. Does everybody do the work worthy of authorship? No. While placing general comments in a shared document's margin may make the paper slightly better, it probably should not qualify for authorship. Review authorship criteria to determine what constitutes authorship. Clear expectations can ensure that everyone is on the same page and that everyone feels the process is fair, especially for individuals who plan to invest significant time in the project. Clear expectations for each author should occur before any writing begins, including deadlines and specific contributions. [6] [7] [6]

Pitfall: Not considering limitations of work before the paper is written

Avoid this pitfall by reviewing recent manuscripts and reading the limitations sections of these papers. Many of these limitations sections will make notions about generalizability to other populations. Some will discuss low power. Even the best papers in the top journals have many limitations. The best way to avoid or mitigate your work's limitations is to consider them during the planning phase.

How can you set up your project to limit your limitations section?

What (types of) samples should you include in your study?

Were you originally thinking of retrospective design, but it could be prospective?

What steps can you utilize to control baseline characteristics between groups?

Consider all limitations and think about how you can control these before data collection.

Phase II Pitfalls: Data Collection and Analysis

After the planning has occurred, typically after institutional review board (IRB) approval, the data collection and analysis phase can transpire. The entire team should typically stay involved throughout these phases. Below are pitfalls to avoid.

Pitfall: Not being involved in the data collection phase

It is important to be involved with the data collection phase, even if you do not personally collect data. Train the individuals who collect data to ensure all are on the same page and provide periodic oversight to ensure accuracy and quality of the data over time. [8]  Do not assume the data collection phase is going smoothly – you may find yourself with a huge dataset riddled with inconsistencies or errors. Schedule periodic meetings to review data.

Pitfall: Not being involved with the statistical analysis phase

If you are not conducting the statistical analysis, do not assume that the person who is analyzing the data is 100% on the same page. Have meetings about the data, how to interpret the data, and the limitations of the data. Ask what other ways the data could be analyzed and how reviewers might negatively critique the data itself or the statistical methods.

The person conducting the analysis will not have the same familiarity with the topic. You are not going to be as familiar with the outputs. By understanding each other, you will a) have clearer, more robust methods and results in sections of the paper, b) limit critiques regarding the statistical approach/data outcomes, c) understand your research better for any presentations, discussion, or future work, and d) develop a positive collaboration for future work.

Phase III Pitfalls: The Writing Phase

The next phase is the writing phase. While this section covers pitfalls during the writing phase, for recommendations on conducting a literature search, writing, and publishing research, see StatPearls Evidence-Base Medicine Chapter: How to Write and Publish a Scientific Manuscript. [9]  Below are pitfalls that can occur during the writing phase. 

Pitfall: Poor or outdated references

When writing your paper, perform multiple literature searches to ensure all recent, salient references are covered—claims about recent similar work or research that frames your study if the references are outdated. Journals may even ask reviewers to comment on the presence or absence of up-to-date/suitable references. Conduct a literature search prior to data collection and stay on top of references throughout the research process as new papers become available.

Pitfall: No clearly defined purpose of the paper

Many aspects of manuscripts can get overlooked. Lack of a clear purpose statement can doom a paper to futility. Remind the readers of the goal of the project. You do not want consumers of your research to read the results section and forget what the goals/main outcomes are. The purpose statement should be located at the end of the introduction section.  

Pitfall: Unclear methods making research hard to reproduce

A common concern in science is the lack of transparency in methods for reproducibility. The methods section should allow a reader to understand exactly what was done and conduct the study. Consider examining the S treng T hening the R eporting of OB servational studies in E pidemiology (STROBE) checklist for the methods (as well as other paper sections) to ensure best reporting practices for reproducibility. [10]

Pitfall: The tables and narratives are the same

Reviewers prefer you not to state findings in narratives that are in tables. Tables focus readers on the most important results and are not redundant with the written content. Make call-outs to the table in the paper's narrative sections, but do not state information found in tables.  

Pitfall: Not reporting all data/outcomes

Some authors will state the main outcome of interest or have a statement such as “there were no other statistically significant findings between other groups.” Authors must report all outcomes and statistical analyses for reproducibility of the research. While this may be difficult to do with a broad approach, utilize tables and appendices to report all outcomes to show transparency and limit researcher bias.

Pitfall: Repeating results in discussion

Do not simply restate in the discussion what you already have in the results section. Utilize this section of the paper to link other references to your work and reflect on other empirical investigations' similarities or differences. Explain why your research provides an impactful contribution to the topic.  

Pitfall: Making conclusions that do not align with your work

Authors sometimes note in their conclusions how the work impacts a topic due to X reason when X may be too broad a claim and the work doesn’t really support or prove that notion. Researchers should align their conclusions to their own results and highlight the significance of their findings.

Pitfall: Thinking the title is not a big deal

A strong title will help with the impact/readership of your paper. Consider keeping a short title that provides the main takeaway. Papers with more concise titles and present the study conclusion result in a bigger impact/receive more citations. [11]

Pitfall: Completing the abstract last minute

Similar to the title, do not underestimate an abstract. Journal and conference reviewers (and the general audience) may only read your abstract. The abstract must have the key results and contributions of the study and be well-written.

Phase IV Pitfalls: Submitting to a Journal

After the paper has been written, it is time to choose the journal. This phase also has numerous pitfalls. Below are pitfalls that can occur during this phase.  

Pitfall: Choosing the wrong journal

Choosing the journal for your work can be overwhelming due to the number of options. Always look at the aims and scope of prospective journals. Look through the author guidelines to ensure that your manuscript adheres. This will save time. Review your reference list for any journals that appear more than once; if so, consider submitting to that journal. You do not want to submit your paper, wait two weeks, and then get a desk rejection because the editors state the paper is not aligned to the journal's aims and scope.

Additionally, researchers can aim too high and spend months (and numerous hours in journal submission portals) trying to publish a manuscript in a journal with a very large impact factor. Though admirable, if the research design and results lacking “gold standard” reporting, authors should consider a journal that is more likely to accept. Find a balance between the quality of your paper and the quality of the journal. Seek feedback from the other authors and/or senior colleagues who can provide honest feedback.

Pitfall: Poor cover letter on journal submission

Do not submit work with a flawed cover letter (errors or lack of clarity in how your work contributes to the body of literature). Spend time writing a detailed cover letter once, have it edited by someone else, and utilize that for all future projects. You can highlight the differences (e.g., the purpose of this work, our results showed) with each project. Use the cover letter to highlight the significance of the study while adhering to the disclosure guidelines (e.g., conflicts of interests, authors contributions, data releases, etc.), which will help the editorial board determine not only the suitability of the paper for the journal but also streamline the review process. [12]

Pitfall: Assuming that after the paper has been submitted to a journal, the work is done             

The paper has been submitted! You think you are finished…but, unfortunately, the publishing game may still be far from over. Researchers often do not recognize the amount of time going into the submission/rejection/revisions phases. Revisions can sometimes be total overhauls, more work than writing a whole new paper. Be prepared to continue working.

Phase V Pitfalls: The Rejections, Revisions, and Acceptance Phase

Finally, perhaps the most unpredictable phase, the rejections, revisions, and acceptance phase, has unique pitfalls and other obstacles.

Pitfall: Mourning rejections too long/ “sitting on” a rejected paper             

Did you get a desk to reject (i.e., the manuscript was not even sent for blind review)? That is unfortunate but common. You do not have time to sulk. Get that paper submitted somewhere else. The older the data, the less desirable your paper becomes. If the paper went in for a full review and was rejected, that may be even tougher than a desk reject because more time has elapsed. The good news is that (hopefully) you received feedback to incorporate in a revision. Do not spend too much time grieving rejections.

Pitfall: Not laying to rest rejected papers when it is indeed their time to go

Did you write a paper a couple of years ago, and you’ve submitted it to 20 different journals? The data is getting old. The topic wasn’t focused on. The sample size was small. Perhaps the project is not worth pursuing any longer. Do not give in to the sunk cost fallacy. If, however, you are proud of the work and stand by the paper, do not give up. If you believe after the numerous rejections that the topic/project is flawed, you can use this failure as a personal learning/growth opportunity. Do not repeat controllable mistakes on future projects.

Pitfall: Not addressing all of reviewer feedback

Did you get a revise and resubmit? Great news! The reviewers and editors will likely ask you to respond to each comment when you resubmit. Address all of the reviewer feedback. Take your time reading through the feedback, digest it, and re-read it. Carefully respond and decide how to revise your manuscript based on the feedback. Share the reviews and the duties of revision with coauthors. In your response to reviewers, stay professional and address each statement, even if you disagree with what is stated. If you do not respond to each statement, the reviewers often highlight the concern(s) again.

Pitfall: Thinking you know what the reviewers are going to say

Research reviewers are like a box of chocolates. You never know what you are going to get. You may be worried about a section of your paper/research approach, and the reviewers do not mention it at all in their review; instead, they criticize a section of your manuscript that you are most proud of.

In some reviews, you may get feedback like the following:

Reviewer #1

Please change lines 104-108 as I believe they are irrelevant to your study.

Reviewer #2

Please build on lines 104-108, as I believe they are the foundation of your study.

Sometimes, after multiple revisions, there are new concerns presented by the reviewers. This can be disheartening. Should some regulations restrict reviewers from bringing up new ideas/concerns during revision #7? Perhaps. Does any current rule prevent them from doing this? No.

During the review process, we must have faith that the reviewers are knowledgeable and provide fair, insightful, and constructive feedback. While the review process can be arbitrary or frustrating in some cases, peer review remains the gold standard in a scientific publication. Stay positive and persistent. Stay professional in responses to the reviewers. Remember that the review process can be very beneficial as it often leads to feedback that truly elevates your work and makes the product (and you) look better. [13]

Pitfall: Not rewarding yourself for a published paper

You did it! Celebrate your accomplishment. Reflect on the merit of your effort before you move on to other work or re-enter the cycle of IRBs, data coding, journal submissions, etc. Remember and appreciate how remarkable it is that you just contributed knowledge to the world.

  • Clinical Significance

Many pitfalls can occur throughout the research process. Researchers should understand these pitfalls and utilize strategies to avoid them to produce high-quality, sought-after research results that are useful for basic science and clinical practice.

  • Review Questions
  • Access free multiple choice questions on this topic.
  • Comment on this article.

Disclosure: Jacob Shreffler declares no relevant financial relationships with ineligible companies.

Disclosure: Martin Huecker declares no relevant financial relationships with ineligible companies.

This book is distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) ( http://creativecommons.org/licenses/by-nc-nd/4.0/ ), which permits others to distribute the work, provided that the article is not altered or used commercially. You are not required to obtain permission to distribute this article, provided that you credit the author and journal.

  • Cite this Page Shreffler J, Huecker MR. Common Pitfalls In The Research Process. [Updated 2023 Mar 6]. In: StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2024 Jan-.

In this Page

Bulk download.

  • Bulk download StatPearls data from FTP

Related information

  • PMC PubMed Central citations
  • PubMed Links to PubMed

Recent Activity

  • Common Pitfalls In The Research Process - StatPearls Common Pitfalls In The Research Process - StatPearls

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

Connect with NLM

National Library of Medicine 8600 Rockville Pike Bethesda, MD 20894

Web Policies FOIA HHS Vulnerability Disclosure

Help Accessibility Careers

statistics

Understanding Generalizability and Transferability

In this chapter, we discuss generalizabililty, transferability, and the interrelationship between the two. We also explain how these two aspects of research operate in different methodologies, demonstrating how researchers may apply these concepts throughout the research process.

Generalizability Overview

Generalizability is applied by researchers in an academic setting. It can be defined as the extension of research findings and conclusions from a study conducted on a sample population to the population at large. While the dependability of this extension is not absolute, it is statistically probable. Because sound generalizability requires data on large populations, quantitative research -- experimental for instance -- provides the best foundation for producing broad generalizability. The larger the sample population, the more one can generalize the results. For example, a comprehensive study of the role computers play in the writing process might reveal that it is statistically probable that students who do most of their composing on a computer will move chunks of text around more than students who do not compose on a computer.

Transferability Overview

Transferability is applied by the readers of research. Although generalizability usually applies only to certain types of quantitative methods, transferability can apply in varying degrees to most types of research . Unlike generalizability, transferability does not involve broad claims, but invites readers of research to make connections between elements of a study and their own experience. For instance, teachers at the high school level might selectively apply to their own classrooms results from a study demonstrating that heuristic writing exercises help students at the college level.

Interrelationships

Generalizability and transferability are important elements of any research methodology, but they are not mutually exclusive: generalizability, to varying degrees, rests on the transferability of research findings. It is important for researchers to understand the implications of these twin aspects of research before designing a study. Researchers who intend to make a generalizable claim must carefully examine the variables involved in the study. Among these are the sample of the population used and the mechanisms behind formulating a causal model. Furthermore, if researchers desire to make the results of their study transferable to another context, they must keep a detailed account of the environment surrounding their research, and include a rich description of that environment in their final report. Armed with the knowledge that the sample population was large and varied, as well as with detailed information about the study itself, readers of research can more confidently generalize and transfer the findings to other situations.

Generalizability

Generalizability is not only common to research, but to everyday life as well. In this section, we establish a practical working definition of generalizability as it is applied within and outside of academic research. We also define and consider three different types of generalizability and some of their probable applications. Finally, we discuss some of the possible shortcomings and limitations of generalizability that researchers must be aware of when constructing a study they hope will yield potentially generalizable results.

In many ways, generalizability amounts to nothing more than making predictions based on a recurring experience. If something occurs frequently, we expect that it will continue to do so in the future. Researchers use the same type of reasoning when generalizing about the findings of their studies. Once researchers have collected sufficient data to support a hypothesis, a premise regarding the behavior of that data can be formulated, making it generalizable to similar circumstances. Because of its foundation in probability, however, such a generalization cannot be regarded as conclusive or exhaustive.

While generalizability can occur in informal, nonacademic settings, it is usually applied only to certain research methods in academic studies. Quantitative methods allow some generalizability. Experimental research, for example, often produces generalizable results. However, such experimentation must be rigorous in order for generalizable results to be found.

An example of generalizability in everyday life involves driving. Operating an automobile in traffic requires that drivers make assumptions about the likely outcome of certain actions. When approaching an intersection where one driver is preparing to turn left, the driver going straight through the intersection assumes that the left-turning driver will yield the right of way before turning. The driver passing through the intersection applies this assumption cautiously, recognizing the possibility that the other driver might turn prematurely.

American drivers also generalize that everyone will drive on the right hand side of the road. Yet if we try to generalize this assumption to other settings, such as England, we will be making a potentially disastrous mistake. Thus, it is obvious that generalizing is necessary for forming coherent interpretations in many different situations, but we do not expect our generalizations to operate the same way in every circumstance. With enough evidence we can make predictions about human behavior, yet we must simultaneously recognize that our assumptions are based on statistical probability.

Consider this example of generalizable research in the field of English studies. A study on undergraduate instructor evaluations of composition instructors might reveal that there is a strong correlation between the grade students are expecting to earn in a course and whether they give their instructor high marks. The study might discover that 95% of students who expect to receive a "C" or lower in their class give their instructor a rating of "average" or below. Therefore, there would be a high probability that future students expecting a "C" or lower would not give their instructor high marks. However, the results would not necessarily be conclusive. Some students might defy the trend. In addition, a number of different variables could also influence students' evaluations of an instructor, including instructor experience, class size, and relative interest in a particular subject. These variables -- and others -- would have to be addressed in order for the study to yield potentially valid results. However, even if virtually all variables were isolated, results of the study would not be 100% conclusive. At best, researchers can make educated predictions of future events or behaviors, not guarantee the prediction in every case. Thus, before generalizing, findings must be tested through rigorous experimentation, which enables researchers to confirm or reject the premises governing their data set.

Considerations

There are three types of generalizability that interact to produce probabilistic models. All of them involve generalizing a treatment or measurement to a population outside of the original study. Researchers who wish to generalize their claims should try to apply all three forms to their research, or the strength of their claims will be weakened (Runkel & McGrath, 1972).

In one type of generalizability, researchers determine whether a specific treatment will produce the same results in different circumstances. To do this, they must decide if an aspect within the original environment, a factor beyond the treatment, generated the particular result. This will establish how flexibly the treatment adapts to new situations. Higher adaptability means that the treatment is generalizable to a greater variety of situations. For example, imagine that a new set of heuristic prewriting questions designed to encourage freshman college students to consider audience more fully works so well that the students write thoroughly developed rhetorical analyses of their target audiences. To responsibly generalize that this heuristic is effective, a researcher would need to test the same prewriting exercise in a variety of educational settings at the college level, using different teachers, students, and environments. If the same positive results are produced, the treatment is generalizable.

A second form of generalizability focuses on measurements rather than treatments. For a result to be considered generalizable outside of the test group, it must produce the same results with different forms of measurement. In terms of the heuristic example above, the findings will be more generalizable if the same results are obtained when assessed "with questions having a slightly different wording, or when we use a six-point scale instead of a nine-point scale" (Runkel & McGrath, 1972, p.46).

A third type of generalizability concerns the subjects of the test situation. Although the results of an experiment may be internally valid, that is, applicable to the group tested, in many situations the results cannot be generalized beyond that particular group. Researchers who hope to generalize their results to a larger population should ensure that their test group is relatively large and randomly chosen. However, researchers should consider the fact that test populations of over 10,000 subjects do not significantly increase generalizability (Firestone,1993).

Potential Limitations

No matter how carefully these three forms of generalizability are applied, there is no absolute guarantee that the results obtained in a study will occur in every situation outside the study. In order to determine causal relationships in a test environment, precision is of utmost importance. Yet if researchers wish to generalize their findings, scope and variance must be emphasized over precision. Therefore, it becomes difficult to test for precision and generalizability simultaneously, since a focus on one reduces the reliability of the other. One solution to this problem is to perform a greater number of observations, which has a dual effect: first, it increases the sample population, which heightens generalizability; second, precision can be reasonably maintained because the random errors between observations will average out (Runkel and McGrath, 1972).

Transferability

Transferability describes the process of applying the results of research in one situation to other similar situations. In this section, we establish a practical working definition of transferability as it's applied within and outside of academic research. We also outline important considerations researchers must be aware of in order to make their results potentially transferable, as well as the critical role the reader plays in this process. Finally, we discuss possible shortcomings and limitations of transferability that researchers must be aware of when planning and conducting a study that will yield potentially transferable results.

Transferability is a process performed by readers of research. Readers note the specifics of the research situation and compare them to the specifics of an environment or situation with which they are familiar. If there are enough similarities between the two situations, readers may be able to infer that the results of the research would be the same or similar in their own situation. In other words, they "transfer" the results of a study to another context. To do this effectively, readers need to know as much as possible about the original research situation in order to determine whether it is similar to their own. Therefore, researchers must supply a highly detailed description of their research situation and methods.

Results of any type of research method can be applied to other situations, but transferability is most relevant to qualitative research methods such as ethnography and case studies. Reports based on these research methods are detailed and specific. However, because they often consider only one subject or one group, researchers who conduct such studies seldom generalize the results to other populations. The detailed nature of the results, however, makes them ideal for transferability.

Transferability is easy to understand when you consider that we are constantly applying this concept to aspects of our daily lives. If, for example, you are an inexperienced composition instructor and you read a study in which a veteran writing instructor discovered that extensive prewriting exercises helped students in her classes come up with much more narrowly defined paper topics, you could ask yourself how much the instructor's classroom resembled your own. If there were many similarities, you might try to draw conclusions about how increasing the amount of prewriting your students do would impact their ability to arrive at sufficiently narrow paper topics. In doing so, you would be attempting to transfer the composition researcher's techniques to your own classroom.

An example of transferable research in the field of English studies is Berkenkotter, Huckin, and Ackerman's (1988) study of a graduate student in a rhetoric Ph.D. program. In this case study, the researchers describe in detail a graduate student's entrance into the language community of his academic program, and particularly his struggle learning the writing conventions of this community. They make conclusions as to why certain things might have affected the graduate student, "Nate," in certain ways, but they are unable to generalize their findings to all graduate students in rhetoric Ph.D. programs. It is simply one study of one person in one program. However, from the level of detail the researchers provide, readers can take certain aspects of Nate's experience and apply them to other contexts and situations. This is transferability. First-year graduate students who read the Berkenhotter, Huckin, and Ackerman study may recognize similarities in their own situation while professors may recognize difficulties their students are having and understand these difficulties a bit better. The researchers do not claim that their results apply to other situations. Instead, they report their findings and make suggestions about possible causes for Nate's difficulties and eventual success. Readers then look at their own situation and decide if these causes may or may not be relevant.

When designing a study researchers have to consider their goals: Do they want to provide limited information about a broad group in order to indicate trends or patterns? Or do they want to provide detailed information about one person or small group that might suggest reasons for a particular behavior? The method they choose will determine the extent to which their results can be transferred since transferability is more applicable to certain kinds of research methods than others.

Thick Description: When writing up the results of a study, it is important that the researcher provide specific information about and a detailed description of her subject(s), location, methods, role in the study, etc. This is commonly referred to as "thick description" of methods and findings; it is important because it allows readers to make an informed judgment about whether they can transfer the findings to their own situation. For example, if an educator conducts an ethnography of her writing classroom, and finds that her students' writing improved dramatically after a series of student-teacher writing conferences, she must describe in detail the classroom setting, the students she observed, and her own participation. If the researcher does not provide enough detail, it will be difficult for readers to try the same strategy in their own classrooms. If the researcher fails to mention that she conducted this research in a small, upper-class private school, readers may transfer the results to a large, inner-city public school expecting a similar outcome.

The Reader's Role: The role of readers in transferability is to apply the methods or results of a study to their own situation. In doing so, readers must take into account differences between the situation outlined by the researcher and their own. If readers of the Berkenhotter, Huckin, and Ackerman study are aware that the research was conducted in a small, upper-class private school, but decide to test the method in a large inner-city public school, they must make adjustments for the different setting and be prepared for different results.

Likewise, readers may decide that the results of a study are not transferable to their own situation. For example, if a study found that watching more than 30 hours of television a week resulted in a worse GPA for graduate students in physics, graduate students in broadcast journalism may conclude that these results do not apply to them.

Readers may also transfer only certain aspects of the study and not the entire conclusion. For example, in the Berkenhotter, Huckin, and Ackerman study, the researchers suggest a variety of reasons for why the graduate student studied experienced difficulties adjusting to his Ph.D. program. Although composition instructors cannot compare "Nate" to first-year college students in their composition class, they could ask some of the same questions about their own class, offering them insight into some of the writing difficulties the first-year undergraduates are experiencing. It is up to readers to decide what findings are important and which may apply to their own situation; if researchers fulfill their responsibility to provide "thick description," this decision is much easier to make.

Understanding research results can help us understand why and how something happens. However, many researchers believe that such understanding is difficult to achieve in relation to human behaviors which they contend are too difficult to understand and often impossible to predict. "Because of the many and varied ways in which individuals differ from each other and because these differences change over time, comprehensive and definitive experiments in the social sciences are not possible...the most we can ever realistically hope to achieve in educational research is not prediction and control but rather only temporary understanding" (Cziko, 1993, p. 10).

Cziko's point is important because transferability allows for "temporary understanding." Instead of applying research results to every situation that may occur in the future, we can apply a similar method to another, similar situation, observe the new results, apply a modified version to another situation, and so on. Transferability takes into account the fact that there are no absolute answers to given situations; rather, every individual must determine their own best practices. Transferring the results of research performed by others can help us develop and modify these practices. However, it is important for readers of research to be aware that results cannot always be transferred; a result that occurs in one situation will not necessarily occur in a similar situation. Therefore, it is critical to take into account differences between situations and modify the research process accordingly.

Although transferability seems to be an obvious, natural, and important method for applying research results and conclusions, it is not perceived as a valid research approach in some academic circles. Perhaps partly in response to critics, in many modern research articles, researchers refer to their results as generalizable or externally valid. Therefore, it seems as though they are not talking about transferability. However, in many cases those same researchers provide direction about what points readers may want to consider, but hesitate to make any broad conclusions or statements. These are characteristics of transferable results.

Generalizability is actually, as we have seen, quite different from transferability. Unfortunately, confusion surrounding these two terms can lead to misinterpretation of research results. Emphasis on the value of transferable results -- as well as a clear understanding among researchers in the field of English of critical differences between the conditions under which research can be generalized, transferred, or, in some cases, both generalized and transferred -- could help qualitative researchers avoid some of the criticisms launched by skeptics who question the value of qualitative research methods.

Generalizability and Transferability: Synthesis

Generalizability allows us to form coherent interpretations in any situation, and to act purposefully and effectively in daily life. Transferability gives us the opportunity to sort through given methods and conclusions to decide what to apply to our own circumstances. In essence, then, both generalizability and transferability allow us to make comparisons between situations. For example, we can generalize that most people in the United States will drive on the right side of the road, but we cannot transfer this conclusion to England or Australia without finding ourselves in a treacherous situation. It is important, therefore, to always consider context when generalizing or transferring results.

Whether a study emphasizes transferability or generalizability is closely related to the goals of the researcher and the needs of the audience. Studies done for a magazine such as Time or a daily newspaper tend towards generalizability, since the publishers want to provide information relevant to a large portion of the population. A research project pointed toward a small group of specialists studying a similar problem may emphasize transferability, since specialists in the field have the ability to transfer aspects of the study results to their own situations without overt generalizations provided by the researcher. Ultimately, the researcher's subject, audience, and goals will determine the method the researcher uses to perform a study, which will then determine the transferability or generalizability of the results.

A Comparison of Generalizability and Transferability

Although generalizability has been a preferred method of research for quite some time, transferability is relatively a new idea. In theory, however, it has always accompanied research issues. It is important to note that generalizability and transferability are not necessarily mutually exclusive; they can overlap.

From an experimental study to a case study, readers transfer the methods, results, and ideas from the research to their own context. Therefore, a generalizable study can also be transferable. For example, a researcher may generalize the results of a survey of 350 people in a university to the university population as a whole; readers of the results may apply, or transfer, the results to their own situation. They will ask themselves, basically, if they fall into the majority or not. However, a transferable study is not always generalizable. For example, in case studies , transferability allows readers the option of applying results to outside contexts, whereas generalizability is basically impossible because one person or a small group of people is not necessarily representative of the larger population.

Controversy, Worth, and Function

Research in the natural sciences has a long tradition of valuing empirical studies; experimental investigation has been considered "the" way to perform research. As social scientists adapted the methods of natural science research to their own needs, they adopted this preference for empirical research. Therefore, studies that are generalizable have long been thought to be more worthwhile; the value of research was often determined by whether a study was generalizable to a population as a whole. However, more and more social scientists are realizing the value of using a variety of methods of inquiry, and the value of transferability is being recognized.

It is important to recognize that generalizability and transferability do not alone determine a study's worth. They perform different functions in research, depending on the topic and goals of the researcher. Where generalizable studies often indicate phenomena that apply to broad categories such as gender or age, transferability can provide some of the how and why behind these results.

However, there are weaknesses that must be considered. Researchers can study a small group that is representative of a larger group and claim that it is likely that their results are applicable to the larger group, but it is impossible for them to test every single person in the larger group. Their conclusions, therefore, are only valid in relation to their own studies. Another problem is that a non-representative group can lead to a faulty generalization. For example, a study of composition students'; revision capabilities which compared students' progress made during a semester in a computer classroom with progress exhibited by students in a traditional classroom might show that computers do aid students in the overall composing process. However, if it were discovered later that an unusually high number of students in the traditional classrooms suffered from substance abuse problems outside of the classroom, the population studied would not be considered representative of the student population as a whole. Therefore, it would be problematic to generalize the results of the study to a larger student population.

In the case of transferability, readers need to know as much detail as possible about a research situation in order to accurately transfer the results to their own. However, it is impossible to provide an absolutely complete description of a situation, and missing details may lead a reader to transfer results to a situation that is not entirely similar to the original one.

Applications to Research Methods

The degree to which generalizability and transferability are applicable differs from methodology to methodology as well as from study to study. Researchers need to be aware of these degrees so that results are not undermined by over-generalizations, and readers need to ensure that they do not read researched results in such a way that the results are misapplied or misinterpreted.

Applications of Transferability and Generalizability: Case Study

Research Design Case studies examine individuals or small groups within a specific context. Research is typically gathered through qualitative means: interviews, observations, etc. Data is usually analyzed either holistically or by coding methods.

Assumptions In research involving case studies, a researcher typically assumes that the results will be transferable. Generalizing is difficult or impossible because one person or small group cannot represent all similar groups or situations. For example, one group of beginning writing students in a particular classroom cannot represent all beginning student writers. Also, conclusions drawn in case studies are only about the participants being observed. With rare exceptions, case studies are not meant to establish cause/effect relationships between variables. The results of a case study are transferable in that researchers "suggest further questions, hypotheses, and future implications," and present the results as "directions and questions" (Lauer & Asher 32).

Example In order to illustrate the writing skills of beginning college writers, a researcher completing a case study might single out one or more students in a composition classroom and set about talking to them about how they judge their own writing as well as reading actual papers, setting up criteria for judgment, and reviewing paper grades/teacher interpretation.

Results of a Study In presenting the results of the previous example, a researcher should define the criteria that were established in order to determine what the researcher meant by "writing skills," provide noteworthy quotes from student interviews, provide other information depending on the kinds of research methods used (e.g., surveys, classroom observation, collected writing samples), and include possibilities for furthering this type of research. Readers are then able to assess for themselves how the researcher's observations might be transferable to other writing classrooms.

Applications of Transferability and Generalizability: Ethnography

Research Design Ethnographies study groups and/or cultures over a period of time. The goal of this type of research is to comprehend the particular group/culture through observer immersion into the culture or group. Research is completed through various methods, which are similar to those of case studies, but since the researcher is immersed within the group for an extended period of time, more detailed information is usually collected during the research. (Jonathon Kozol's "There Are No Children Here" is a good example of this.)

Assumptions As with case studies, findings of ethnographies are also considered to be transferable. The main goals of an ethnography are to "identify, operationally define, and interrelate variables" within a particular context, which ultimately produce detailed accounts or "thick descriptions" (Lauer & Asher 39). Unlike a case study, the researcher here discovers many more details. Results of ethnographies should "suggest variables for further investigation" and not generalize beyond the participants of a study (Lauer & Asher 43). Also, since analysts completing this type of research tend to rely on multiple methods to collect information (a practice also referred to as triangulation), their results typically help create a detailed description of human behavior within a particular environment.

Example The Iowa Writing Program has a widespread reputation for producing excellent writers. In order to begin to understand their training, an ethnographer might observe students throughout their degree program. During this time, the ethnographer could examine the curriculum, follow the writing processes of individual writers, and become acquainted with the writers and their work. By the end of a two year study, the researcher would have a much deeper understanding of the unique and effective features of the program.

Results of a Study Obviously, the Iowa Writing Program is unique, so generalizing any results to another writing program would be problematic. However, an ethnography would provide readers with insights into the program. Readers could ask questions such as: what qualities make it strong and what is unique about the writers who are trained within the program? At this point, readers could attempt to "transfer" applicable knowledge and observations to other writing environments.

Applications of Transferability and Generalizability: Experimental Research

Research Design A researcher working within this methodology creates an environment in which to observe and interpret the results of a research question. A key element in experimental research is that participants in a study are randomly assigned to groups. In an attempt to create a causal model (i.e., to discover the causal origin of a particular phenomenon), groups are treated differently and measurements are conducted to determine if different treatments appear to lead to different effects.

Assumptions Experimental research is usually thought to be generalizable. This methodology explores cause/effect relationships through comparisons among groups (Lauer & Asher 152). Since participants are randomly assigned to groups, and since most experiments involve enough individuals to reasonably approximate the populations from which individual participants are drawn, generalization is justified because "over a large number of allocations, all the groups of subjects will be expected to be identical on all variables" (155).

Example A simplified example: Six composition classrooms are randomly chosen (as are the students and instructors) in which three instructors incorporate the use of electronic mail as a class activity and three do not. When students in the first three classes begin discussing their papers through e-mail and, as a result, make better revisions to their papers than students in the other three classes, a researcher is likely to conclude that incorporating e-mail within a writing classroom improves the quality of students' writing.

Results of a Study Although experimental research is based on cause/effect relationships, "certainty" can never be obtained, but rather results are "probabilistic" (Lauer and Asher 161). Depending on how the researcher has presented the results, they are generalizable in that the students were selected randomly. Since the quality of writing improved with the use of e-mail within all three classrooms, it is probable that e-mail is the cause of the improvement. Readers of this study would transfer the results when they sorted out the details: Are these students representative of a group of students with which the reader is familiar? What types of previous writing experiences have these students had? What kind of writing was expected from these students? The researcher must have provided these details in order for the results to be transferable.

Applications of Transferability and Generalizability: Survey

Research Design The goal of a survey is to gain specific information about either a specific group or a representative sample of a particular group. Survey respondents are asked to respond to one or more of the following kinds of items: open-ended questions, true-false questions, agree-disagree (or Likert) questions, rankings, ratings, and so on. Results are typically used to understand the attitudes, beliefs, or knowledge of a particular group.

Assumptions Assuming that care has been taken in the development of the survey items and selection of the survey sample and that adequate response rates have been achieved, surveys results are generalizable. Note, however, that results from surveys should be generalized only to the population from which the survey results were drawn.

Example For instance, a survey of Colorado State University English graduate students undertaken to determine how well French philosopher/critic Jacques Derrida is understood before and after students take a course in critical literary theory might inform professors that, overall, Derrida's concepts are understood and that CSU's literary theory class, E615, has helped students grasp Derrida's ideas.

Results of a Study The generalizability of surveys depends on several factors. Whether distributed to a mass of people or a select few, surveys are of a "personal nature and subject to distortion." Survey respondents may or may not understand the questions being asked of them. Depending on whether or not the survey designer is nearby, respondents may or may not have the opportunity to clarify their misunderstandings.

It is also important to keep in mind that errors can occur at the development and processing levels. A researcher may inadequately pose questions (that is, not ask the right questions for the information being sought), disrupt the data collection (surveying certain people and not others), and distort the results during the processing (misreading responses and not being able to question the participant, etc.). One way to avoid these kinds of errors is for researchers to examine other studies of a similar nature and compare their results with results that have been obtained in previous studies. This way, any large discrepancies will be exposed. Depending on how large those discrepancies are and what the context of the survey is, the results may or may not be generalizable. For example, if an improved understanding of Derrida is apparent after students complete E615, it can be theorized that E615 effectively teaches students the concepts of Derrida. Issues of transferability might be visible in the actual survey questions themselves; that is, they could provide critical background information readers might need to know in order to transfer the results to another context.

The Qualitative versus Quantitative Debate

In Miles and Huberman's 1994 book Qualitative Data Analysis , quantitative researcher Fred Kerlinger is quoted as saying, "There's no such thing as qualitative data. Everything is either 1 or 0" (p. 40). To this another researcher, D. T. Campbell, asserts "all research ultimately has a qualitative grounding" (p. 40). This back and forth banter among qualitative and quantitative researchers is "essentially unproductive" according to Miles and Huberman. They and many other researchers agree that these two research methods need each other more often than not. However, because typically qualitative data involves words and quantitative data involves numbers, there are some researchers who feel that one is better (or more scientific) than the other. Another major difference between the two is that qualitative research is inductive and quantitative research is deductive. In qualitative research, a hypothesis is not needed to begin research. However, all quantitative research requires a hypothesis before research can begin.

Another major difference between qualitative and quantitative research is the underlying assumptions about the role of the researcher. In quantitative research, the researcher is ideally an objective observer that neither participates in nor influences what is being studied. In qualitative research, however, it is thought that the researcher can learn the most about a situation by participating and/or being immersed in it. These basic underlying assumptions of both methodologies guide and sequence the types of data collection methods employed.

Although there are clear differences between qualitative and quantitative approaches, some researchers maintain that the choice between using qualitative or quantitative approaches actually has less to do with methodologies than it does with positioning oneself within a particular discipline or research tradition. The difficulty of choosing a method is compounded by the fact that research is often affiliated with universities and other institutions. The findings of research projects often guide important decisions about specific practices and policies. The choice of which approach to use may reflect the interests of those conducting or benefitting from the research and the purposes for which the findings will be applied. Decisions about which kind of research method to use may also be based on the researcher's own experience and preference, the population being researched, the proposed audience for findings, time, money, and other resources available (Hathaway, 1995).

Some researchers believe that qualitative and quantitative methodologies cannot be combined because the assumptions underlying each tradition are so vastly different. Other researchers think they can be used in combination only by alternating between methods: qualitative research is appropriate to answer certain kinds of questions in certain conditions and quantitative is right for others. And some researchers think that both qualitative and quantitative methods can be used simultaneously to answer a research question.

To a certain extent, researchers on all sides of the debate are correct: each approach has its drawbacks. Quantitative research often "forces" responses or people into categories that might not "fit" in order to make meaning. Qualitative research, on the other hand, sometimes focuses too closely on individual results and fails to make connections to larger situations or possible causes of the results. Rather than discounting either approach for its drawbacks, though, researchers should find the most effective ways to incorporate elements of both to ensure that their studies are as accurate and thorough as possible.

It is important for researchers to realize that qualitative and quantitative methods can be used in conjunction with each other. In a study of computer-assisted writing classrooms, Snyder (1995) employed both qualitative and quantitative approaches. The study was constructed according to guidelines for quantitative studies: the computer classroom was the "treatment" group and the traditional pen and paper classroom was the "control" group. Both classes contained subjects with the same characteristics from the population sampled. Both classes followed the same lesson plan and were taught by the same teacher in the same semester. The only variable used was the computers. Although Snyder set this study up as an "experiment," she used many qualitative approaches to supplement her findings. She observed both classrooms on a regular basis as a participant-observer and conducted several interviews with the teacher both during and after the semester. However, there were several problems in using this approach: the strict adherence to the same syllabus and lesson plans for both classes and the restricted access of the control group to the computers may have put some students at a disadvantage. Snyder also notes that in retrospect she should have used case studies of the students to further develop her findings. Although her study had certain flaws, Snyder insists that researchers can simultaneously employ qualitative and quantitative methods if studies are planned carefully and carried out conscientiously.

Annotated Bibliography

Babbie, Earl R. (1979). The practice of social research . Belmont: Wadsworth Publishing Company, Inc.

A comprehensive review of social scientific research, including techniques for research. The logic behind social scientific research is discussed.

Berkenkotter, C., Huckin, T.N., & Ackerman, J. (1988). Conventions, conversations, and the writer: Case study of a student in a rhetoric Ph.D. program. Research in the Teaching of English 22 (1), 9-44.

Describes a case study of a beginning student in a Ph.D. program. Looks at the process of his entry into an academic discourse community.

Black, Susan. (1996). Redefining the teacher's role. Executive Educator,18 (8), 23-26.

Discusses the value of well-trained teacher-researchers performing research in their classrooms. Notes that teacher-research focuses on the particular; it does not look for broad, generalizable principles.

Blank, Steven C. (1984). Practical business research methods . Westport: AVI Publishing Company, Inc.

A comprehensive book of how to set up a research project, collect data, and reach and report conclusions.

Bridges, David. (1993). Transferable Skills: A Philosophical Perspective. Studies in Higher Education 18 (1), 43-51.

Transferability of skills in learning is discussed, focusing on the notions of cross-disciplinary, generic, core, and transferable skills and their role in the college curriculum.

Brookhart, Susan M. & Rusnak, Timothy G. (1993). A pedagogy of enrichment, not poverty: Successful lessons of exemplary urban teachers. Journal of Teacher Education, 44 (1), 17-27.

Reports the results of a study that explored the characteristics of effective urban teachers in Pittsburgh. Suggests that the results may be transferable to urban educators in other contexts.

Bryman, Alan. (1988). Quantity and quality in social research . Boston: Unwin Hyman Ltd.

Butcher, Jude. (1994, July). Cohort and case study components in teacher education research. Paper presented at the annual conference of the Australian Teacher Education Association, Brisbane, Queensland, Australia.

Argues that studies of teacher development will be more generalizable if a broad set of methods are used to collect data, if the data collected is both extensive and intensive, and if the methods used take into account the differences in people and situations being studied.

Carter, Duncan. (1993). Critical thinking for writers: Transferable skills or discipline-specific strategies? Composition Studies/Freshman English News, 21 (1), 86-93.

Questions the context-dependency of critical thinking, and whether critical thinking skills are transferable to writing tasks.

Carter, Kathy. (1993). The place of story in the study of teaching and teacher education. Educational Researcher, 22 (1), 5-12.

Discusses the advantages of story-telling in teaching and teacher education, but cautions instructors, who are currently unfamiliar with story-telling in current pedagogical structures, to be careful in implementing this method in their teaching.

Clonts, Jean G. (1992, January). The concept of reliability as it pertains to data from qualitative studies. Paper presented at the annual meeting of the Southwest Educational Research Association, Houston, TX.

Presents a review of literature on reliability in qualitative studies and defines reliability as the extent to which studies can be replicated by using the same methods and getting the same results. Strategies to enhance reliability through study design, data collection, and data analysis are suggested. Generalizability as an estimate of reliability is also explored.

Connelly, Michael F. & Clandinin D. Jean. (1990). Stories of experience and narrative inquiry. Educational Researcher, 19. (5), 2-14.

Describes narrative as a site of inquiry and a qualitative research methodology in which experiences of observer and observed interact. This form of research necessitates the development of new criteria, which may include apparency, verisimilitude, and transferability (7).

Crocker, Linda & Algina, James. (1986). Introduction to classical & modern test theory. New York: Holt, Rinehart and Winston.

Discusses test theory and its application to psychometrics. Chapters range from general overview of major issues to statistical methods and application.

Cronbach, Lee J. et al. (1967). The dependability of behavioral measurements: multifaceted studies of generalizability. Stanford: Stanford UP.

A technical research report that includes statistical methodology in order to contrast multifaceted generalizability with classical reliability.

Cziko, Gary A. (1992). Purposeful behavior as the control of perception: implications for educational research. Educational Researcher, 21 (9), 10-18. El-Hassan, Karma. (1995). Students' Rating of Instruction: Generalizability of Findings. Studies in Educational Research 21 (4), 411-29.

Issues of dimensionality, validity, reliability, and generalizability of students' ratings of instruction are discussed in relation to a study in which 610 college students who evaluated their instructors on the Teacher Effectiveness Scale.

Feingold, Alan. (1994). Gender differences in variability in intellectual abilities: a cross-cultural perspective. Sex Roles: A Journal of Research 20 (1-2), 81-93.

Feingold conducts a cross-cultural quantitative review of contemporary findings of gender differences in variability in verbal, mathematical, and spatial abilities to assess the generalizability of U.S. findings that males are more variable than females in mathematical and spatial abilities, and the sexes are equally variable in verbal ability.

Firestone,William A. (1993). Alternative arguments for generalizing from data as applied to qualitative research. Educational Researcher, 22 (4), 16-22.

Focuses on generalization in three areas of qualitative research: sample to population extrapolation, analytic generalization, and case-to-case transfer (16). Explains underlying principles, related theories, and criteria for each approach.

Fyans, Leslie J. (Ed.). (1983). Generalizability theory: Inferences and practical applications. In New Directions for Testing and Measurement: Vol. 18. San Francisco: Jossey-Bass.

A collection of articles on generalizability theory. The goal of the book is to present different aspects and applications of generalizability theory in a way that allows the reader to apply the theory.

Hammersley, Martyn. (Ed.). (1993). Social research: Philosophy, politics and practice. Newbury Park, CA: Sage Publications.

A collection of articles that provide an overview of positivism; includes an article on increasing the generalizability of qualitative research by Janet Ward Schofield.

Hathaway, R. (1995). Assumptions underlying quantitative and qualitative research: Implications for institutional research. Research in higher education, 36 (5), 535-562.

Hathaway says that the choice between using qualitative or quantitative approaches is less about methodology and more about aligning oneself with particular theoretical and academic traditions. He concluded that the two approaches address questions in very different ways, each one having its own advantages and drawbacks.

Heck, Ronald H., Marcoulides, George A. (1996). . Research in the Teaching of English 22 (1), 9-44.

Hipps, Jerome A. (1993). Trustworthiness and authenticity: Alternate ways to judge authentic assessments. Paper presented at the annual meeting of the American Educational Research Association, Atlanta, GA.

Contrasts the foundational assumptions of the constructivist approach to traditional research and the positivist approach to authentic assessment in relation to generalizability and other research issues.

Howe, Kenneth & Eisenhart, Margaret. (1990). Standards for qualitative (and quantitative) research: A prolegomenon. Educational Researcher, 19 (4), 2-9.

Huang, Chi-yu, et al. (1995, April). A generalizability theory approach to examining teaching evaluation instruments completed by students. Paper presented at the annual meeting of the American Educational Research Association, San Francisco, CA.

Presents the results of a study that used generalizability theory to investigate the reasons for variability in a teacher and course evaluation mechanism.

Hungerford, Harold R. et al. (1992). Investigating and Evaluating Environmental Issues and Actions: Skill Development Modules .

A guide designed to teach students how to investigate and evaluate environmental issues and actions. The guide is presented in six modules including information collecting and surveys, questionnaires, and opinionnaires.

Jackson, Philip W. (1990). The functions of educational research. Educational Researcher 19 (7), 3-9. Johnson, Randell G. (1993, April). A validity generalization study of the multiple assessment and program services test. Paper presented at the annual meeting of the American Educational Research Association, Atlanta, GA.

Presents results of study of validity reports of the Multiple Assessment and Program Services Test using quantitative analysis to determine the generalizability of the results.

Jones, Elizabeth A & Ratcliff, Gary. (1993). Critical thinking skills for college students. (National Center on Postsecondary Teaching, Learning, and Asessment). University Park, PA.

Reviews research literature exploring the nature of critical thinking; discusses the extent to which critical thinking is generalizable across disciplines.

Karpinski, Jakub. (1990). Causality in Sociological Research . Boston: Kluwer Academic Publishers.

Discusses causality and causal analysis in terms of sociological research. Provides equations and explanations.

Kirsch, Irwin S. & Jungeblut, Ann. (1995). Using large-scale assessment results to identify and evaluate generalizable indicators of literacy. (National Center on Adult Literacy, Publication No. TR94-19). Philadelphia, PA.

Reports analysis of data collected during an extensive literacy survey in order to help understand the different variables involved in literacy proficiency. Finds that literacy skills can be predicted across large, heterogeneous populations, but not as effectively across homogeneous populations.

Lauer, Janice M. & Asher, J. William. (1988). Composition research: empirical designs. New York: Oxford Press.

Explains the selection of subjects, formulation of hypotheses or questions, data collection, data analysis, and variable identification through discussion of each design.

LeCompte, Margaret & Goetz, Judith Preissle. (1982). Problems of reliability and validity in ethnographic research. Review of Educational Research, 52 (1), 31-60.

Concentrates on educational research and ethnography and shows how to better take reliability and validity into account when doing ethnographic research.

Marcoulides, George; Simkin, Mark G. (1991). Evaluating student papers: the case for peer review. Journal of Education for Business 67 (2), 80-83.

A preprinted evaluation form and generalizability theory are used to judge the reliability of student grading of their papers.

Maxwell, Joseph A. (1992). Understanding and validity in qualitative research. Harvard Educational Review, 62 (3), 279-300.

Explores the five types of validity used in qualitative research, including generalizable validity, and examines possible threats to research validity.

McCarthy, Christine L. (1996, Spring). What is "critical thinking"? Is it generalizable? Educational Theory, 46 217-239.

Reviews, compares and contrasts a selection of essays from Stephen P. Norris' book The Generalizability of Critical Thinking: Multiple Perspectives on an Education Ideal in order to explore the diversity of the topic of critical thinking.

Miles, Matthew B. & Huberman, A. Michael. (1994). Qualitative data analysis. Thousand Oaks: Sage Publications.

A comprehensive review of data analysis. Subjects range from collecting data to producing an actual report.

Minium, Edward W. & King, M. Bruce, & Bear, Gordon. (1993). Statistical reasoning in psychology and education . New York: John Wiley & Sons, Inc.

A textbook designed to teach students about statistical data and theory.

Moss, Pamela A. (1992). Shifting conceptions of validity in educational measurement: Implications for performance assessment. Review of Educational Research, 62 (3), 229-258. Nachmias, David & Nachmias, Chava . (1981). Research methods in the social sciences. New York: St. Martin's Press.

Discusses the foundations of empirical research, data collection, data processing and analysis, inferential methods, and the ethics of social science research.

Nagy, Philip; Jarchow, Elaine McNally. (1981). Estimating variance components of essay ratings in a complex design. Speech/Conference Paper .

This paper discusses variables influencing written composition quality and how they can be best controlled to improve the reliability assessment of writing ability.

Nagy, William E., Herman, Patricia A., & Anderson, Richard C. (1985). Learning word meanings from context: How broadly generalizable? (University of Illinois at Urbana-Champaign. Center for the Study of Reading, Technical Report No. 347). Cambridge, MA: Bolt, Beranek and Newman.

Reports the results of a study that investigated how students learn word meanings while reading from context. Claims that the study was designed to be generalized.

Naizer, Gilbert. (1992, January). Basic concepts in generalizability theory: A more powerful approach to evaluating reliability. Presented at the annual meeting of the Southwest Educational Research Association, Houston, TX.

Discusses how a measurement approach called generalizability theory (G-theory) is an important alternative to the more classical measurement theory that yields less useful coefficients. G-theory is about the dependability of behavioral measurements that allows the simultaneous estimation of multiple sources of error variance.

Newman, Isadore & Macdonald, Suzanne. (1993, May). Interpreting qualitative data: A methodological inquiry. Paper presented at the annual meeting of the Ohio Academy of Science, Youngstown, OH.

Issues of consistency, triangulation, and generalizability are discussed in relation to a qualitative study involving graduate student participants. The authors refute Polkinghorne's views of the generalizability of qualitative research, arguing that quantitative research is more suitable for generalizability.

Norris, Stephen P. (Ed.). (1992). The generalizability of critical thinking: multiple perspectives on an education ideal. New York: Teachers College Press. A set of essays from a variety of disciplines presenting different perspectives on the topic of the generalizability of critical thinking. The authors refer and respond to each other. Peshkin, Alan. (1993). The goodness of qualitative research. Educational Researcher, 22 (2), 23-29.

Discusses how effective qualitative research can be in obtaining desired results and concludes that it is an important tool scholars can use in their explorations. The four categories of qualitative research--description, interpretation, verification, and evaluation--are examined.

Rafilson, Fred. (1991, July). The case for validity generalization.

Describes generalization as a quantitative process. Briefly discusses theory, method, examples, and applications of validity generalization, emphasizing unseen local methodological problems.

Rhodebeck, Laurie A. The structure of men's and women's feminist orientations: feminist identity and feminist opinion. Gender & Society 10 (4), 386-404.

This study considers two problems: the extent to which feminist opinions are distinct from feminist identity and the generalizability of these separate constructs across gender and time.

Runkel, Philip J. & McGrath, E. Joseph. (1972). Research on human behavior: A systematic guide to method. New York: Holt, Rinehart and Winston, Inc.

Discusses how researchers can utilize their experiences of human behavior and apply them to research in a systematic and explicit fashion.

Salomon, Gavriel. (1991). Transcending the qualitative-quantitative debate: The analytic and systemic approaches to educational research. Educational Researcher, 20 (6), 10-18.

Examines the complex issues/variables involved in studies. Two types of approaches are explored: an Analytic Approach, which assumes internal and external issues, and a Systematic Approach, in which each component affects the whole. Also discusses how a study can never fully measure how much x affects y because there are so many inter-relations. Knowledge is applied differently within each approach.

Schrag, Francis. (1992). In defense of positivist research paradigms. Educational Researcher, 21 (5), 5-8.

Positivist critics Elliot Eisner, Fredrick Erikson, Henry Giroux, and Thomas Popkewitz are logically committed to propositions that can be tested only by means of positivist research paradigms. A definition of positivism is gathered through example. Overall, it is concluded that educational research need not aspire to be practical.

Sekaran, Uma. (1984). Research methods for managers: A skill-building approach. New York: John Wiley and Sons.

Discusses managerial approaches to conducting research in organizations. Provides understandable definitions and explanations of such methods as sampling and data analysis and interpretation.

Shadish, William R. (1995). The logic of generalization: five principles common to experiments and ethnographies. American Journal of Community Psychology 23 (3), 419-29.

Both experiments and ethnographies are highly localized, so they are often criticized for lack of generalizability. This article describes a logic of generalization that may help solve such problems.

Shavelson, Richard J. & Webb, Noreen M. (1991). Generalizability theory: A primer. Newbury Park, CA: Sage Publications.

Snyder, I. (1995). Multiple perspectives in literacy research: Integrating the quantitative and qualitative. Language and Education, 9 (1), 45-59.

This article explains a study in which the author employed quantitative and qualitative methods simultaneously to compare computer composition classrooms and traditional classrooms. Although there were some problems with integrating both approaches, Snyder says they can be used together if researchers plan carefully and use their methods thoughtfully.

Stallings, William M. (1995). Confessions of a quantitative educational researcher trying to teach qualitative research. Educational Researcher, 24 (3), 31-32.

Discusses the trials and tribulations of teaching a qualitative research course to graduate students. The author describes the successes and failings he encounters and asks colleagues for suggestions of readings for his syllabus.

Wagner, Ellen D. (1993, January). Evaluating distance learning projects: An approach for cross-project comparisons. Paper presented at the annual meeting of the Association for educational Communication and Technology, New Orleans, LA.

Describes a methodology developed to evaluate distance learning projects in a way that takes into account specific institutional issues while producing generalizable, valid and reliable results that allow for discussion among different institutions.

Yin, Robert K. (1989). Case Study Research: Design and Methods. London: Sage Publications.

A small section on the application of generalizability in regards to case studies.

Barnes, Jeffrey,  Kerri Conrad, Christof Demont-Heinrich, Mary Graziano, Dawn Kowalski, Jamie Neufeld, Jen Zamora, & Mike Palmquist. (2005). Generalizability and Transferability. Writing@CSU . Colorado State University. https://writing.colostate.edu/guides/guide.cfm?guideid=65

logo of blainy an ai research paper writer

Research Findings Guide: Examples, Types, and Structuring Tips

  • November 7, 2024

Dr. Marvin L. Smith

Research findings are the core insights derived from a study, summarizing key results and answering the research question . They reveal patterns, relationships, or trends, whether through qualitative insights or quantitative data. 

Understanding how to write findings in research is crucial—it provides clarity, supports claims, and often determines the study’s impact. 

This article explores types of research findings , examples, and methods to present them effectively. 

Whether you’re looking to learn about research findings, explore examples of different types of research findings, or need guidance on structuring findings in a paper, this guide has you covered.

What Are Research Findings?

Research findings are the key results or discoveries from a study. 

They directly address the research question, revealing insights that support or challenge the hypothesis. These findings can be qualitative, like observations or themes, or quantitative, like statistics or patterns. 

Clear and accurate findings ensure readers understand the study’s outcome.

Importance of Research Findings

Research findings are the cornerstone of any study, offering critical evidence to support the researcher’s conclusions . They serve as the basis for establishing facts, verifying hypotheses, and validating the study’s objectives. 

Findings not only demonstrate that a study has met its intended goals but also underscore its relevance and reliability within a field.

In academic and professional circles, strong research findings enhance the credibility of a paper. They demonstrate that the study is grounded in rigorous data analysis, increasing the likelihood of acceptance by peers and recognition in the wider community. 

When findings are presented clearly and backed by sound evidence, they provide a solid foundation for future research, inspiring new questions and guiding subsequent studies.

Additionally, well-structured findings are invaluable for decision-making across sectors. 

In healthcare , they inform treatment protocols and health policies; in business , they shape product development and strategic planning; in education , they enhance teaching methods and learning outcomes. 

Without concrete findings, research would lack direction and impact, making these insights essential for applying knowledge to real-world problems and advancing knowledge in meaningful ways.

Types of Research Findings

Research findings can be categorized based on both the data’s nature and its origin, giving readers insight into the study’s methods and the type of evidence presented. 

This classification—into qualitative vs. quantitative findings and primary vs. secondary findings—helps researchers structure their findings more effectively and ensures readers can follow the study’s approach.

Qualitative vs. Quantitative Findings

Qualitative findings focus on understanding experiences, motivations, and perceptions by capturing themes, patterns, and meanings through methods like interviews, focus groups, and observations. They address the “how” and “why” behind phenomena.

For instance, in a study exploring customer satisfaction, qualitative findings might reveal that customers feel valued when employees remember their names—an insight drawn from direct interview responses. 

These findings provide rich, contextual insights that add depth and human perspectives.

Quantitative findings , on the other hand, are based on numerical data derived from methods like surveys, experiments, and statistical analysis. These findings answer “what,” “how much,” or “how many,” offering a measurable view of trends or relationships.

In the same customer satisfaction study, quantitative findings could show that 78% of surveyed customers rate their satisfaction as “high.” 

This data-driven approach offers clear, objective metrics that validate or challenge hypotheses and allow comparisons across variables.

Using both qualitative and quantitative findings often provides a balanced perspective, combining numerical rigor with contextual understanding—a method known as mixed-methods research.

Primary vs. Secondary Findings

Primary findings emerge directly from the researcher’s own data collection. These are original insights obtained through firsthand research, such as an experiment, survey, or field study. 

For example, a study measuring the effects of a new medication on blood pressure would yield primary findings about its effectiveness based on the data collected during clinical trials. 

These findings introduce new knowledge to the field, making them highly valuable and directly tied to the study’s objectives.

Secondary findings are drawn from data or insights that others have previously collected. They often support or add context to primary findings without introducing new information. 

For instance, in a study on the effectiveness of teaching methods, secondary findings might include statistics from government reports on educational outcomes. 

These findings help frame the research within a broader context, showing how it aligns with or diverges from existing studies. By combining primary and secondary findings, researchers can enhance the credibility of their work and provide a fuller understanding of the topic.

Each type of research finding serves a unique purpose. 

Qualitative and quantitative findings provide different perspectives on data, while primary and secondary findings strengthen the depth and breadth of research, making it more impactful and informative.

Interpreting Research Findings

Interpreting research findings involves reviewing data to uncover meaningful insights. This process not only highlights key results but also strengthens the study’s credibility by ensuring clarity and accuracy in presenting findings.

Analyzing Data and Recognizing Patterns

Data analysis helps identify trends, correlations, or differences within the dataset. By recognizing these patterns, researchers draw conclusions that directly address the research question. Effective analysis reveals underlying insights and shows how findings connect to the study’s objectives.

Ensuring Validity and Accuracy

Ensuring validity and accuracy is essential in interpreting findings. Validity confirms that the findings genuinely reflect the data and align with the research question, while accuracy ensures consistent, error-free analysis. Together, they reinforce the study’s reliability, making its conclusions trustworthy and impactful.

Presenting Research Findings

Presenting research findings effectively is crucial for helping readers understand and engage with the study’s outcomes. A well-structured presentation and the use of visuals ensure clarity, while accessible language makes findings understandable to a wider audience.

Structuring a Clear Presentation

Organize findings in a logical order that directly addresses the research question, starting with the most significant results. Use headings, subheadings, and bullet points to break down information, making it easier for readers to follow. Concise and clear language keeps the focus on key insights without overwhelming details.

Using Visuals for Emphasis

Visuals, like charts, graphs, and tables, highlight key data points and make complex information easier to grasp.

For example, a bar chart can show survey results by comparing response percentages across different groups, while a line graph can track changes over time, such as monthly sales trends or patient recovery rates. 

Tables are also effective for presenting detailed numerical data, allowing readers to compare figures side by side.

These visual aids help readers quickly identify patterns and comparisons, enhancing the impact of findings and overall comprehension. A well-placed chart or table can make a difference by translating raw data into a clear, engaging visual summary .

Communicating Findings to Non-Experts

To reach non-experts, simplify technical terms and avoid jargon. Use clear, everyday language and provide brief explanations when needed. Presenting findings in an accessible way ensures broader understanding and maximizes the research’s reach and influence.

Challenges in Reporting Research Findings

Reporting research findings can be challenging, as it requires accuracy and objectivity to avoid misleading readers. Identifying and addressing these challenges is essential to maintain credibility and transparency.

Misinterpretation and Bias

Misinterpretation happens when findings are presented in a way that leads readers to incorrect conclusions. To avoid this, use precise language and clarify key points. Bias, whether intentional or unintentional, can distort findings by emphasizing certain outcomes. Being aware of potential biases and reporting objectively ensures a fair representation of the data.

Addressing Limitations

Every study has limitations—factors that may affect the results or the generalizability of findings. Clearly acknowledging these limitations shows honesty and helps readers understand the scope of the research. Addressing limitations also guides future studies by highlighting areas for improvement or further investigation.

Applications of Research Findings

Research findings have broad applications across various fields, guiding decisions, influencing policies, and informing future research. 

In healthcare, findings can lead to new treatments, improve patient care, or shape public health guidelines. 

In business, research insights drive product development, marketing strategies, and customer experience enhancements. 

In education, findings inform teaching methods and curriculum design, ultimately improving learning outcomes.

Moreover, research findings often serve as a foundation for further studies, allowing other researchers to build on existing knowledge. Whether applied to solve real-world problems or deepen understanding within a field, these findings contribute significantly to progress and innovation.

Ready to transform your research writing experience?

Sign up for Blainy today and start writing your research papers with confidence!

About the Author:

Leave a comment cancel reply.

You must be logged in to post a comment.

How to Fix and Prevent Run-on Sentences

How to Fix and Prevent Run-on Sentences

research findings yield

Research Gaps: How to Identify, Types, and Examples Explained

Is Using ChatGPT Cheating

Is Using ChatGPT Cheating?

Blainy vs. ChatGPT

Blainy vs. ChatGPT

FANBOYS: Coordinating Conjunctions

FANBOYS: Coordinating Conjunctions

Research Implications

Research Implications 101: A Beginner’s Guide to Writing for Impact

Blainy.com logo white

Academic writing made brilliantly simple, always.

Limited time offers 🎁🎉.

Black Friday Sale

Cyber Monday Sale

Discover More

50+ Free AI Tools 

Feature Updates

✉ [email protected]

✆ +971 50 760 0820

📍190 Hackett Inlet, Eastern Region, Dubai, UAE.

Terms & Condition

Privacy Policy

Copyright © 2024 Blainy

Jump to navigation

Home

Cochrane Training

Chapter 15: interpreting results and drawing conclusions.

Holger J Schünemann, Gunn E Vist, Julian PT Higgins, Nancy Santesso, Jonathan J Deeks, Paul Glasziou, Elie A Akl, Gordon H Guyatt; on behalf of the Cochrane GRADEing Methods Group

Key Points:

  • This chapter provides guidance on interpreting the results of synthesis in order to communicate the conclusions of the review effectively.
  • Methods are presented for computing, presenting and interpreting relative and absolute effects for dichotomous outcome data, including the number needed to treat (NNT).
  • For continuous outcome measures, review authors can present summary results for studies using natural units of measurement or as minimal important differences when all studies use the same scale. When studies measure the same construct but with different scales, review authors will need to find a way to interpret the standardized mean difference, or to use an alternative effect measure for the meta-analysis such as the ratio of means.
  • Review authors should not describe results as ‘statistically significant’, ‘not statistically significant’ or ‘non-significant’ or unduly rely on thresholds for P values, but report the confidence interval together with the exact P value.
  • Review authors should not make recommendations about healthcare decisions, but they can – after describing the certainty of evidence and the balance of benefits and harms – highlight different actions that might be consistent with particular patterns of values and preferences and other factors that determine a decision such as cost.

Cite this chapter as: Schünemann HJ, Vist GE, Higgins JPT, Santesso N, Deeks JJ, Glasziou P, Akl EA, Guyatt GH. Chapter 15: Interpreting results and drawing conclusions [last updated August 2023]. In: Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, Welch VA (editors). Cochrane Handbook for Systematic Reviews of Interventions version 6.5. Cochrane, 2024. Available from www.training.cochrane.org/handbook .

15.1 Introduction

The purpose of Cochrane Reviews is to facilitate healthcare decisions by patients and the general public, clinicians, guideline developers, administrators and policy makers. They also inform future research. A clear statement of findings, a considered discussion and a clear presentation of the authors’ conclusions are, therefore, important parts of the review. In particular, the following issues can help people make better informed decisions and increase the usability of Cochrane Reviews:

  • information on all important outcomes, including adverse outcomes;
  • the certainty of the evidence for each of these outcomes, as it applies to specific populations and specific interventions; and
  • clarification of the manner in which particular values and preferences may bear on the desirable and undesirable consequences of the intervention.

A ‘Summary of findings’ table, described in Chapter 14 , Section 14.1 , provides key pieces of information about health benefits and harms in a quick and accessible format. It is highly desirable that review authors include a ‘Summary of findings’ table in Cochrane Reviews alongside a sufficient description of the studies and meta-analyses to support its contents. This description includes the rating of the certainty of evidence, also called the quality of the evidence or confidence in the estimates of the effects, which is expected in all Cochrane Reviews.

‘Summary of findings’ tables are usually supported by full evidence profiles which include the detailed ratings of the evidence (Guyatt et al 2011a, Guyatt et al 2013a, Guyatt et al 2013b, Santesso et al 2016). The Discussion section of the text of the review provides space to reflect and consider the implications of these aspects of the review’s findings. Cochrane Reviews include five standard subheadings to ensure the Discussion section places the review in an appropriate context: ‘Summary of main results (benefits and harms)’; ‘Potential biases in the review process’; ‘Overall completeness and applicability of evidence’; ‘Certainty of the evidence’; and ‘Agreements and disagreements with other studies or reviews’. Following the Discussion, the Authors’ conclusions section is divided into two standard subsections: ‘Implications for practice’ and ‘Implications for research’. The assessment of the certainty of evidence facilitates a structured description of the implications for practice and research.

Because Cochrane Reviews have an international audience, the Discussion and Authors’ conclusions should, so far as possible, assume a broad international perspective and provide guidance for how the results could be applied in different settings, rather than being restricted to specific national or local circumstances. Cultural differences and economic differences may both play an important role in determining the best course of action based on the results of a Cochrane Review. Furthermore, individuals within societies have widely varying values and preferences regarding health states, and use of societal resources to achieve particular health states. For all these reasons, and because information that goes beyond that included in a Cochrane Review is required to make fully informed decisions, different people will often make different decisions based on the same evidence presented in a review.

Thus, review authors should avoid specific recommendations that inevitably depend on assumptions about available resources, values and preferences, and other factors such as equity considerations, feasibility and acceptability of an intervention. The purpose of the review should be to present information and aid interpretation rather than to offer recommendations. The discussion and conclusions should help people understand the implications of the evidence in relation to practical decisions and apply the results to their specific situation. Review authors can aid this understanding of the implications by laying out different scenarios that describe certain value structures.

In this chapter, we address first one of the key aspects of interpreting findings that is also fundamental in completing a ‘Summary of findings’ table: the certainty of evidence related to each of the outcomes. We then provide a more detailed consideration of issues around applicability and around interpretation of numerical results, and provide suggestions for presenting authors’ conclusions.

15.2 Issues of indirectness and applicability

15.2.1 the role of the review author.

“A leap of faith is always required when applying any study findings to the population at large” or to a specific person. “In making that jump, one must always strike a balance between making justifiable broad generalizations and being too conservative in one’s conclusions” (Friedman et al 1985). In addition to issues about risk of bias and other domains determining the certainty of evidence, this leap of faith is related to how well the identified body of evidence matches the posed PICO ( Population, Intervention, Comparator(s) and Outcome ) question. As to the population, no individual can be entirely matched to the population included in research studies. At the time of decision, there will always be differences between the study population and the person or population to whom the evidence is applied; sometimes these differences are slight, sometimes large.

The terms applicability, generalizability, external validity and transferability are related, sometimes used interchangeably and have in common that they lack a clear and consistent definition in the classic epidemiological literature (Schünemann et al 2013). However, all of the terms describe one overarching theme: whether or not available research evidence can be directly used to answer the health and healthcare question at hand, ideally supported by a judgement about the degree of confidence in this use (Schünemann et al 2013). GRADE’s certainty domains include a judgement about ‘indirectness’ to describe all of these aspects including the concept of direct versus indirect comparisons of different interventions (Atkins et al 2004, Guyatt et al 2008, Guyatt et al 2011b).

To address adequately the extent to which a review is relevant for the purpose to which it is being put, there are certain things the review author must do, and certain things the user of the review must do to assess the degree of indirectness. Cochrane and the GRADE Working Group suggest using a very structured framework to address indirectness. We discuss here and in Chapter 14 what the review author can do to help the user. Cochrane Review authors must be extremely clear on the population, intervention and outcomes that they intend to address. Chapter 14, Section 14.1.2 , also emphasizes a crucial step: the specification of all patient-important outcomes relevant to the intervention strategies under comparison.

In considering whether the effect of an intervention applies equally to all participants, and whether different variations on the intervention have similar effects, review authors need to make a priori hypotheses about possible effect modifiers, and then examine those hypotheses (see Chapter 10, Section 10.10 and Section 10.11 ). If they find apparent subgroup effects, they must ultimately decide whether or not these effects are credible (Sun et al 2012). Differences between subgroups, particularly those that correspond to differences between studies, should be interpreted cautiously. Some chance variation between subgroups is inevitable so, unless there is good reason to believe that there is an interaction, review authors should not assume that the subgroup effect exists. If, despite due caution, review authors judge subgroup effects in terms of relative effect estimates as credible (i.e. the effects differ credibly), they should conduct separate meta-analyses for the relevant subgroups, and produce separate ‘Summary of findings’ tables for those subgroups.

The user of the review will be challenged with ‘individualization’ of the findings, whether they seek to apply the findings to an individual patient or a policy decision in a specific context. For example, even if relative effects are similar across subgroups, absolute effects will differ according to baseline risk. Review authors can help provide this information by identifying identifiable groups of people with varying baseline risks in the ‘Summary of findings’ tables, as discussed in Chapter 14, Section 14.1.3 . Users can then identify their specific case or population as belonging to a particular risk group, if relevant, and assess their likely magnitude of benefit or harm accordingly. A description of the identifying prognostic or baseline risk factors in a brief scenario (e.g. age or gender) will help users of a review further.

Another decision users must make is whether their individual case or population of interest is so different from those included in the studies that they cannot use the results of the systematic review and meta-analysis at all. Rather than rigidly applying the inclusion and exclusion criteria of studies, it is better to ask whether or not there are compelling reasons why the evidence should not be applied to a particular patient. Review authors can sometimes help decision makers by identifying important variation where divergence might limit the applicability of results (Rothwell 2005, Schünemann et al 2006, Guyatt et al 2011b, Schünemann et al 2013), including biologic and cultural variation, and variation in adherence to an intervention.

In addressing these issues, review authors cannot be aware of, or address, the myriad of differences in circumstances around the world. They can, however, address differences of known importance to many people and, importantly, they should avoid assuming that other people’s circumstances are the same as their own in discussing the results and drawing conclusions.

15.2.2 Biological variation

Issues of biological variation that may affect the applicability of a result to a reader or population include divergence in pathophysiology (e.g. biological differences between women and men that may affect responsiveness to an intervention) and divergence in a causative agent (e.g. for infectious diseases such as malaria, which may be caused by several different parasites). The discussion of the results in the review should make clear whether the included studies addressed all or only some of these groups, and whether any important subgroup effects were found.

15.2.3 Variation in context

Some interventions, particularly non-pharmacological interventions, may work in some contexts but not in others; the situation has been described as program by context interaction (Hawe et al 2004). Contextual factors might pertain to the host organization in which an intervention is offered, such as the expertise, experience and morale of the staff expected to carry out the intervention, the competing priorities for the clinician’s or staff’s attention, the local resources such as service and facilities made available to the program and the status or importance given to the program by the host organization. Broader context issues might include aspects of the system within which the host organization operates, such as the fee or payment structure for healthcare providers and the local insurance system. Some interventions, in particular complex interventions (see Chapter 17 ), can be only partially implemented in some contexts, and this requires judgements about indirectness of the intervention and its components for readers in that context (Schünemann 2013).

Contextual factors may also pertain to the characteristics of the target group or population, such as cultural and linguistic diversity, socio-economic position, rural/urban setting. These factors may mean that a particular style of care or relationship evolves between service providers and consumers that may or may not match the values and technology of the program.

For many years these aspects have been acknowledged when decision makers have argued that results of evidence reviews from other countries do not apply in their own country or setting. Whilst some programmes/interventions have been successfully transferred from one context to another, others have not (Resnicow et al 1993, Lumley et al 2004, Coleman et al 2015). Review authors should be cautious when making generalizations from one context to another. They should report on the presence (or otherwise) of context-related information in intervention studies, where this information is available.

15.2.4 Variation in adherence

Variation in the adherence of the recipients and providers of care can limit the certainty in the applicability of results. Predictable differences in adherence can be due to divergence in how recipients of care perceive the intervention (e.g. the importance of side effects), economic conditions or attitudes that make some forms of care inaccessible in some settings, such as in low-income countries (Dans et al 2007). It should not be assumed that high levels of adherence in closely monitored randomized trials will translate into similar levels of adherence in normal practice.

15.2.5 Variation in values and preferences

Decisions about healthcare management strategies and options involve trading off health benefits and harms. The right choice may differ for people with different values and preferences (i.e. the importance people place on the outcomes and interventions), and it is important that decision makers ensure that decisions are consistent with a patient or population’s values and preferences. The importance placed on outcomes, together with other factors, will influence whether the recipients of care will or will not accept an option that is offered (Alonso-Coello et al 2016) and, thus, can be one factor influencing adherence. In Section 15.6 , we describe how the review author can help this process and the limits of supporting decision making based on intervention reviews.

15.3 Interpreting results of statistical analyses

15.3.1 confidence intervals.

Results for both individual studies and meta-analyses are reported with a point estimate together with an associated confidence interval. For example, ‘The odds ratio was 0.75 with a 95% confidence interval of 0.70 to 0.80’. The point estimate (0.75) is the best estimate of the magnitude and direction of the experimental intervention’s effect compared with the comparator intervention. The confidence interval describes the uncertainty inherent in any estimate, and describes a range of values within which we can be reasonably sure that the true effect actually lies. If the confidence interval is relatively narrow (e.g. 0.70 to 0.80), the effect size is known precisely. If the interval is wider (e.g. 0.60 to 0.93) the uncertainty is greater, although there may still be enough precision to make decisions about the utility of the intervention. Intervals that are very wide (e.g. 0.50 to 1.10) indicate that we have little knowledge about the effect and this imprecision affects our certainty in the evidence, and that further information would be needed before we could draw a more certain conclusion.

A 95% confidence interval is often interpreted as indicating a range within which we can be 95% certain that the true effect lies. This statement is a loose interpretation, but is useful as a rough guide. The strictly correct interpretation of a confidence interval is based on the hypothetical notion of considering the results that would be obtained if the study were repeated many times. If a study were repeated infinitely often, and on each occasion a 95% confidence interval calculated, then 95% of these intervals would contain the true effect (see Section 15.3.3 for further explanation).

The width of the confidence interval for an individual study depends to a large extent on the sample size. Larger studies tend to give more precise estimates of effects (and hence have narrower confidence intervals) than smaller studies. For continuous outcomes, precision depends also on the variability in the outcome measurements (i.e. how widely individual results vary between people in the study, measured as the standard deviation); for dichotomous outcomes it depends on the risk of the event (more frequent events allow more precision, and narrower confidence intervals), and for time-to-event outcomes it also depends on the number of events observed. All these quantities are used in computation of the standard errors of effect estimates from which the confidence interval is derived.

The width of a confidence interval for a meta-analysis depends on the precision of the individual study estimates and on the number of studies combined. In addition, for random-effects models, precision will decrease with increasing heterogeneity and confidence intervals will widen correspondingly (see Chapter 10, Section 10.10.4 ). As more studies are added to a meta-analysis the width of the confidence interval usually decreases. However, if the additional studies increase the heterogeneity in the meta-analysis and a random-effects model is used, it is possible that the confidence interval width will increase.

Confidence intervals and point estimates have different interpretations in fixed-effect and random-effects models. While the fixed-effect estimate and its confidence interval address the question ‘what is the best (single) estimate of the effect?’, the random-effects estimate assumes there to be a distribution of effects, and the estimate and its confidence interval address the question ‘what is the best estimate of the average effect?’ A confidence interval may be reported for any level of confidence (although they are most commonly reported for 95%, and sometimes 90% or 99%). For example, the odds ratio of 0.80 could be reported with an 80% confidence interval of 0.73 to 0.88; a 90% interval of 0.72 to 0.89; and a 95% interval of 0.70 to 0.92. As the confidence level increases, the confidence interval widens.

There is logical correspondence between the confidence interval and the P value (see Section 15.3.3 ). The 95% confidence interval for an effect will exclude the null value (such as an odds ratio of 1.0 or a risk difference of 0) if and only if the test of significance yields a P value of less than 0.05. If the P value is exactly 0.05, then either the upper or lower limit of the 95% confidence interval will be at the null value. Similarly, the 99% confidence interval will exclude the null if and only if the test of significance yields a P value of less than 0.01.

Together, the point estimate and confidence interval provide information to assess the effects of the intervention on the outcome. For example, suppose that we are evaluating an intervention that reduces the risk of an event and we decide that it would be useful only if it reduced the risk of an event from 30% by at least 5 percentage points to 25% (these values will depend on the specific clinical scenario and outcomes, including the anticipated harms). If the meta-analysis yielded an effect estimate of a reduction of 10 percentage points with a tight 95% confidence interval, say, from 7% to 13%, we would be able to conclude that the intervention was useful since both the point estimate and the entire range of the interval exceed our criterion of a reduction of 5% for net health benefit. However, if the meta-analysis reported the same risk reduction of 10% but with a wider interval, say, from 2% to 18%, although we would still conclude that our best estimate of the intervention effect is that it provides net benefit, we could not be so confident as we still entertain the possibility that the effect could be between 2% and 5%. If the confidence interval was wider still, and included the null value of a difference of 0%, we would still consider the possibility that the intervention has no effect on the outcome whatsoever, and would need to be even more sceptical in our conclusions.

Review authors may use the same general approach to conclude that an intervention is not useful. Continuing with the above example where the criterion for an important difference that should be achieved to provide more benefit than harm is a 5% risk difference, an effect estimate of 2% with a 95% confidence interval of 1% to 4% suggests that the intervention does not provide net health benefit.

15.3.2 P values and statistical significance

A P value is the standard result of a statistical test, and is the probability of obtaining the observed effect (or larger) under a ‘null hypothesis’. In the context of Cochrane Reviews there are two commonly used statistical tests. The first is a test of overall effect (a Z-test), and its null hypothesis is that there is no overall effect of the experimental intervention compared with the comparator on the outcome of interest. The second is the (Chi 2 ) test for heterogeneity, and its null hypothesis is that there are no differences in the intervention effects across studies.

A P value that is very small indicates that the observed effect is very unlikely to have arisen purely by chance, and therefore provides evidence against the null hypothesis. It has been common practice to interpret a P value by examining whether it is smaller than particular threshold values. In particular, P values less than 0.05 are often reported as ‘statistically significant’, and interpreted as being small enough to justify rejection of the null hypothesis. However, the 0.05 threshold is an arbitrary one that became commonly used in medical and psychological research largely because P values were determined by comparing the test statistic against tabulations of specific percentage points of statistical distributions. If review authors decide to present a P value with the results of a meta-analysis, they should report a precise P value (as calculated by most statistical software), together with the 95% confidence interval. Review authors should not describe results as ‘statistically significant’, ‘not statistically significant’ or ‘non-significant’ or unduly rely on thresholds for P values , but report the confidence interval together with the exact P value (see MECIR Box 15.3.a ).

We discuss interpretation of the test for heterogeneity in Chapter 10, Section 10.10.2 ; the remainder of this section refers mainly to tests for an overall effect. For tests of an overall effect, the computation of P involves both the effect estimate and precision of the effect estimate (driven largely by sample size). As precision increases, the range of plausible effects that could occur by chance is reduced. Correspondingly, the statistical significance of an effect of a particular magnitude will usually be greater (the P value will be smaller) in a larger study than in a smaller study.

P values are commonly misinterpreted in two ways. First, a moderate or large P value (e.g. greater than 0.05) may be misinterpreted as evidence that the intervention has no effect on the outcome. There is an important difference between this statement and the correct interpretation that there is a high probability that the observed effect on the outcome is due to chance alone. To avoid such a misinterpretation, review authors should always examine the effect estimate and its 95% confidence interval.

The second misinterpretation is to assume that a result with a small P value for the summary effect estimate implies that an experimental intervention has an important benefit. Such a misinterpretation is more likely to occur in large studies and meta-analyses that accumulate data over dozens of studies and thousands of participants. The P value addresses the question of whether the experimental intervention effect is precisely nil; it does not examine whether the effect is of a magnitude of importance to potential recipients of the intervention. In a large study, a small P value may represent the detection of a trivial effect that may not lead to net health benefit when compared with the potential harms (i.e. harmful effects on other important outcomes). Again, inspection of the point estimate and confidence interval helps correct interpretations (see Section 15.3.1 ).

MECIR Box 15.3.a Relevant expectations for conduct of intervention reviews

15.3.3 Relation between confidence intervals, statistical significance and certainty of evidence

The confidence interval (and imprecision) is only one domain that influences overall uncertainty about effect estimates. Uncertainty resulting from imprecision (i.e. statistical uncertainty) may be no less important than uncertainty from indirectness, or any other GRADE domain, in the context of decision making (Schünemann 2016). Thus, the extent to which interpretations of the confidence interval described in Sections 15.3.1 and 15.3.2 correspond to conclusions about overall certainty of the evidence for the outcome of interest depends on these other domains. If there are no concerns about other domains that determine the certainty of the evidence (i.e. risk of bias, inconsistency, indirectness or publication bias), then the interpretation in Sections 15.3.1 and 15.3.2 . about the relation of the confidence interval to the true effect may be carried forward to the overall certainty. However, if there are concerns about the other domains that affect the certainty of the evidence, the interpretation about the true effect needs to be seen in the context of further uncertainty resulting from those concerns.

For example, nine randomized controlled trials in almost 6000 cancer patients indicated that the administration of heparin reduces the risk of venous thromboembolism (VTE), with a risk ratio of 43% (95% CI 19% to 60%) (Akl et al 2011a). For patients with a plausible baseline risk of approximately 4.6% per year, this relative effect suggests that heparin leads to an absolute risk reduction of 20 fewer VTEs (95% CI 9 fewer to 27 fewer) per 1000 people per year (Akl et al 2011a). Now consider that the review authors or those applying the evidence in a guideline have lowered the certainty in the evidence as a result of indirectness. While the confidence intervals would remain unchanged, the certainty in that confidence interval and in the point estimate as reflecting the truth for the question of interest will be lowered. In fact, the certainty range will have unknown width so there will be unknown likelihood of a result within that range because of this indirectness. The lower the certainty in the evidence, the less we know about the width of the certainty range, although methods for quantifying risk of bias and understanding potential direction of bias may offer insight when lowered certainty is due to risk of bias. Nevertheless, decision makers must consider this uncertainty, and must do so in relation to the effect measure that is being evaluated (e.g. a relative or absolute measure). We will describe the impact on interpretations for dichotomous outcomes in Section 15.4 .

15.4 Interpreting results from dichotomous outcomes (including numbers needed to treat)

15.4.1 relative and absolute risk reductions.

Clinicians may be more inclined to prescribe an intervention that reduces the relative risk of death by 25% than one that reduces the risk of death by 1 percentage point, although both presentations of the evidence may relate to the same benefit (i.e. a reduction in risk from 4% to 3%). The former refers to the relative reduction in risk and the latter to the absolute reduction in risk. As described in Chapter 6, Section 6.4.1 , there are several measures for comparing dichotomous outcomes in two groups. Meta-analyses are usually undertaken using risk ratios (RR), odds ratios (OR) or risk differences (RD), but there are several alternative ways of expressing results.

Relative risk reduction (RRR) is a convenient way of re-expressing a risk ratio as a percentage reduction:

research findings yield

For example, a risk ratio of 0.75 translates to a relative risk reduction of 25%, as in the example above.

The risk difference is often referred to as the absolute risk reduction (ARR) or absolute risk increase (ARI), and may be presented as a percentage (e.g. 1%), as a decimal (e.g. 0.01), or as account (e.g. 10 out of 1000). We consider different choices for presenting absolute effects in Section 15.4.3 . We then describe computations for obtaining these numbers from the results of individual studies and of meta-analyses in Section 15.4.4 .

15.4.2 Number needed to treat (NNT)

The number needed to treat (NNT) is a common alternative way of presenting information on the effect of an intervention. The NNT is defined as the expected number of people who need to receive the experimental rather than the comparator intervention for one additional person to either incur or avoid an event (depending on the direction of the result) in a given time frame. Thus, for example, an NNT of 10 can be interpreted as ‘it is expected that one additional (or less) person will incur an event for every 10 participants receiving the experimental intervention rather than comparator over a given time frame’. It is important to be clear that:

  • since the NNT is derived from the risk difference, it is still a comparative measure of effect (experimental versus a specific comparator) and not a general property of a single intervention; and
  • the NNT gives an ‘expected value’. For example, NNT = 10 does not imply that one additional event will occur in each and every group of 10 people.

NNTs can be computed for both beneficial and detrimental events, and for interventions that cause both improvements and deteriorations in outcomes. In all instances NNTs are expressed as positive whole numbers. Some authors use the term ‘number needed to harm’ (NNH) when an intervention leads to an adverse outcome, or a decrease in a positive outcome, rather than improvement. However, this phrase can be misleading (most notably, it can easily be read to imply the number of people who will experience a harmful outcome if given the intervention), and it is strongly recommended that ‘number needed to harm’ and ‘NNH’ are avoided. The preferred alternative is to use phrases such as ‘number needed to treat for an additional beneficial outcome’ (NNTB) and ‘number needed to treat for an additional harmful outcome’ (NNTH) to indicate direction of effect.

As NNTs refer to events, their interpretation needs to be worded carefully when the binary outcome is a dichotomization of a scale-based outcome. For example, if the outcome is pain measured on a ‘none, mild, moderate or severe’ scale it may have been dichotomized as ‘none or mild’ versus ‘moderate or severe’. It would be inappropriate for an NNT from these data to be referred to as an ‘NNT for pain’. It is an ‘NNT for moderate or severe pain’.

We consider different choices for presenting absolute effects in Section 15.4.3 . We then describe computations for obtaining these numbers from the results of individual studies and of meta-analyses in Section 15.4.4 .

15.4.3 Expressing risk differences

Users of reviews are liable to be influenced by the choice of statistical presentations of the evidence. Hoffrage and colleagues suggest that physicians’ inferences about statistical outcomes are more appropriate when they deal with ‘natural frequencies’ – whole numbers of people, both treated and untreated (e.g. treatment results in a drop from 20 out of 1000 to 10 out of 1000 women having breast cancer) – than when effects are presented as percentages (e.g. 1% absolute reduction in breast cancer risk) (Hoffrage et al 2000). Probabilities may be more difficult to understand than frequencies, particularly when events are rare. While standardization may be important in improving the presentation of research evidence (and participation in healthcare decisions), current evidence suggests that the presentation of natural frequencies for expressing differences in absolute risk is best understood by consumers of healthcare information (Akl et al 2011b). This evidence provides the rationale for presenting absolute risks in ‘Summary of findings’ tables as numbers of people with events per 1000 people receiving the intervention (see Chapter 14 ).

RRs and RRRs remain crucial because relative effects tend to be substantially more stable across risk groups than absolute effects (see Chapter 10, Section 10.4.3 ). Review authors can use their own data to study this consistency (Cates 1999, Smeeth et al 1999). Risk differences from studies are least likely to be consistent across baseline event rates; thus, they are rarely appropriate for computing numbers needed to treat in systematic reviews. If a relative effect measure (OR or RR) is chosen for meta-analysis, then a comparator group risk needs to be specified as part of the calculation of an RD or NNT. In addition, if there are several different groups of participants with different levels of risk, it is crucial to express absolute benefit for each clinically identifiable risk group, clarifying the time period to which this applies. Studies in patients with differing severity of disease, or studies with different lengths of follow-up will almost certainly have different comparator group risks. In these cases, different comparator group risks lead to different RDs and NNTs (except when the intervention has no effect). A recommended approach is to re-express an odds ratio or a risk ratio as a variety of RD or NNTs across a range of assumed comparator risks (ACRs) (McQuay and Moore 1997, Smeeth et al 1999). Review authors should bear these considerations in mind not only when constructing their ‘Summary of findings’ table, but also in the text of their review.

For example, a review of oral anticoagulants to prevent stroke presented information to users by describing absolute benefits for various baseline risks (Aguilar and Hart 2005, Aguilar et al 2007). They presented their principal findings as “The inherent risk of stroke should be considered in the decision to use oral anticoagulants in atrial fibrillation patients, selecting those who stand to benefit most for this therapy” (Aguilar and Hart 2005). Among high-risk atrial fibrillation patients with prior stroke or transient ischaemic attack who have stroke rates of about 12% (120 per 1000) per year, warfarin prevents about 70 strokes yearly per 1000 patients, whereas for low-risk atrial fibrillation patients (with a stroke rate of about 2% per year or 20 per 1000), warfarin prevents only 12 strokes. This presentation helps users to understand the important impact that typical baseline risks have on the absolute benefit that they can expect.

15.4.4 Computations

Direct computation of risk difference (RD) or a number needed to treat (NNT) depends on the summary statistic (odds ratio, risk ratio or risk differences) available from the study or meta-analysis. When expressing results of meta-analyses, review authors should use, in the computations, whatever statistic they determined to be the most appropriate summary for meta-analysis (see Chapter 10, Section 10.4.3 ). Here we present calculations to obtain RD as a reduction in the number of participants per 1000. For example, a risk difference of –0.133 corresponds to 133 fewer participants with the event per 1000.

RDs and NNTs should not be computed from the aggregated total numbers of participants and events across the trials. This approach ignores the randomization within studies, and may produce seriously misleading results if there is unbalanced randomization in any of the studies. Using the pooled result of a meta-analysis is more appropriate. When computing NNTs, the values obtained are by convention always rounded up to the next whole number.

15.4.4.1 Computing NNT from a risk difference (RD)

A NNT may be computed from a risk difference as

research findings yield

where the vertical bars (‘absolute value of’) in the denominator indicate that any minus sign should be ignored. It is convention to round the NNT up to the nearest whole number. For example, if the risk difference is –0.12 the NNT is 9; if the risk difference is –0.22 the NNT is 5. Cochrane Review authors should qualify the NNT as referring to benefit (improvement) or harm by denoting the NNT as NNTB or NNTH. Note that this approach, although feasible, should be used only for the results of a meta-analysis of risk differences. In most cases meta-analyses will be undertaken using a relative measure of effect (RR or OR), and those statistics should be used to calculate the NNT (see Section 15.4.4.2 and 15.4.4.3 ).

15.4.4.2 Computing risk differences or NNT from a risk ratio

To aid interpretation of the results of a meta-analysis of risk ratios, review authors may compute an absolute risk reduction or NNT. In order to do this, an assumed comparator risk (ACR) (otherwise known as a baseline risk, or risk that the outcome of interest would occur with the comparator intervention) is required. It will usually be appropriate to do this for a range of different ACRs. The computation proceeds as follows:

research findings yield

As an example, suppose the risk ratio is RR = 0.92, and an ACR = 0.3 (300 per 1000) is assumed. Then the effect on risk is 24 fewer per 1000:

research findings yield

The NNT is 42:

research findings yield

15.4.4.3 Computing risk differences or NNT from an odds ratio

Review authors may wish to compute a risk difference or NNT from the results of a meta-analysis of odds ratios. In order to do this, an ACR is required. It will usually be appropriate to do this for a range of different ACRs. The computation proceeds as follows:

research findings yield

As an example, suppose the odds ratio is OR = 0.73, and a comparator risk of ACR = 0.3 is assumed. Then the effect on risk is 62 fewer per 1000:

research findings yield

The NNT is 17:

research findings yield

15.4.4.4 Computing risk ratio from an odds ratio

Because risk ratios are easier to interpret than odds ratios, but odds ratios have favourable mathematical properties, a review author may decide to undertake a meta-analysis based on odds ratios, but to express the result as a summary risk ratio (or relative risk reduction). This requires an ACR. Then

research findings yield

It will often be reasonable to perform this transformation using the median comparator group risk from the studies in the meta-analysis.

15.4.4.5 Computing confidence limits

Confidence limits for RDs and NNTs may be calculated by applying the above formulae to the upper and lower confidence limits for the summary statistic (RD, RR or OR) (Altman 1998). Note that this confidence interval does not incorporate uncertainty around the ACR.

If the 95% confidence interval of OR or RR includes the value 1, one of the confidence limits will indicate benefit and the other harm. Thus, appropriate use of the words ‘fewer’ and ‘more’ is required for each limit when presenting results in terms of events. For NNTs, the two confidence limits should be labelled as NNTB and NNTH to indicate the direction of effect in each case. The confidence interval for the NNT will include a ‘discontinuity’, because increasingly smaller risk differences that approach zero will lead to NNTs approaching infinity. Thus, the confidence interval will include both an infinitely large NNTB and an infinitely large NNTH.

15.5 Interpreting results from continuous outcomes (including standardized mean differences)

15.5.1 meta-analyses with continuous outcomes.

Review authors should describe in the study protocol how they plan to interpret results for continuous outcomes. When outcomes are continuous, review authors have a number of options to present summary results. These options differ if studies report the same measure that is familiar to the target audiences, studies report the same or very similar measures that are less familiar to the target audiences, or studies report different measures.

15.5.2 Meta-analyses with continuous outcomes using the same measure

If all studies have used the same familiar units, for instance, results are expressed as durations of events, such as symptoms for conditions including diarrhoea, sore throat, otitis media, influenza or duration of hospitalization, a meta-analysis may generate a summary estimate in those units, as a difference in mean response (see, for instance, the row summarizing results for duration of diarrhoea in Chapter 14, Figure 14.1.b and the row summarizing oedema in Chapter 14, Figure 14.1.a ). For such outcomes, the ‘Summary of findings’ table should include a difference of means between the two interventions. However, when units of such outcomes may be difficult to interpret, particularly when they relate to rating scales (again, see the oedema row of Chapter 14, Figure 14.1.a ). ‘Summary of findings’ tables should include the minimum and maximum of the scale of measurement, and the direction. Knowledge of the smallest change in instrument score that patients perceive is important – the minimal important difference (MID) – and can greatly facilitate the interpretation of results (Guyatt et al 1998, Schünemann and Guyatt 2005). Knowing the MID allows review authors and users to place results in context. Review authors should state the MID – if known – in the Comments column of their ‘Summary of findings’ table. For example, the chronic respiratory questionnaire has possible scores in health-related quality of life ranging from 1 to 7 and 0.5 represents a well-established MID (Jaeschke et al 1989, Schünemann et al 2005).

15.5.3 Meta-analyses with continuous outcomes using different measures

When studies have used different instruments to measure the same construct, a standardized mean difference (SMD) may be used in meta-analysis for combining continuous data. Without guidance, clinicians and patients may have little idea how to interpret results presented as SMDs. Review authors should therefore consider issues of interpretability when planning their analysis at the protocol stage and should consider whether there will be suitable ways to re-express the SMD or whether alternative effect measures, such as a ratio of means, or possibly as minimal important difference units (Guyatt et al 2013b) should be used. Table 15.5.a and the following sections describe these options.

Table 15.5.a Approaches and their implications to presenting results of continuous variables when primary studies have used different instruments to measure the same construct. Adapted from Guyatt et al (2013b)

15.5.3.1 Presenting and interpreting SMDs using generic effect size estimates

The SMD expresses the intervention effect in standard units rather than the original units of measurement. The SMD is the difference in mean effects between the experimental and comparator groups divided by the pooled standard deviation of participants’ outcomes, or external SDs when studies are very small (see Chapter 6, Section 6.5.1.2 ). The value of a SMD thus depends on both the size of the effect (the difference between means) and the standard deviation of the outcomes (the inherent variability among participants or based on an external SD).

If review authors use the SMD, they might choose to present the results directly as SMDs (row 1a, Table 15.5.a and Table 15.5.b ). However, absolute values of the intervention and comparison groups are typically not useful because studies have used different measurement instruments with different units. Guiding rules for interpreting SMDs (or ‘Cohen’s effect sizes’) exist, and have arisen mainly from researchers in the social sciences (Cohen 1988). One example is as follows: 0.2 represents a small effect, 0.5 a moderate effect and 0.8 a large effect (Cohen 1988). Variations exist (e.g. <0.40=small, 0.40 to 0.70=moderate, >0.70=large). Review authors might consider including such a guiding rule in interpreting the SMD in the text of the review, and in summary versions such as the Comments column of a ‘Summary of findings’ table. However, some methodologists believe that such interpretations are problematic because patient importance of a finding is context-dependent and not amenable to generic statements.

15.5.3.2 Re-expressing SMDs using a familiar instrument

The second possibility for interpreting the SMD is to express it in the units of one or more of the specific measurement instruments used by the included studies (row 1b, Table 15.5.a and Table 15.5.b ). The approach is to calculate an absolute difference in means by multiplying the SMD by an estimate of the SD associated with the most familiar instrument. To obtain this SD, a reasonable option is to calculate a weighted average across all intervention groups of all studies that used the selected instrument (preferably a pre-intervention or post-intervention SD as discussed in Chapter 10, Section 10.5.2 ). To better reflect among-person variation in practice, or to use an instrument not represented in the meta-analysis, it may be preferable to use a standard deviation from a representative observational study. The summary effect is thus re-expressed in the original units of that particular instrument and the clinical relevance and impact of the intervention effect can be interpreted using that familiar instrument.

The same approach of re-expressing the results for a familiar instrument can also be used for other standardized effect measures such as when standardizing by MIDs (Guyatt et al 2013b): see Section 15.5.3.5 .

Table 15.5.b Application of approaches when studies have used different measures: effects of dexamethasone for pain after laparoscopic cholecystectomy (Karanicolas et al 2008). Reproduced with permission of Wolters Kluwer

1 Certainty rated according to GRADE from very low to high certainty. 2 Substantial unexplained heterogeneity in study results. 3 Imprecision due to wide confidence intervals. 4 The 20% comes from the proportion in the control group requiring rescue analgesia. 5 Crude (arithmetic) means of the post-operative pain mean responses across all five trials when transformed to a 100-point scale.

15.5.3.3 Re-expressing SMDs through dichotomization and transformation to relative and absolute measures

A third approach (row 1c, Table 15.5.a and Table 15.5.b ) relies on converting the continuous measure into a dichotomy and thus allows calculation of relative and absolute effects on a binary scale. A transformation of a SMD to a (log) odds ratio is available, based on the assumption that an underlying continuous variable has a logistic distribution with equal standard deviation in the two intervention groups, as discussed in Chapter 10, Section 10.6  (Furukawa 1999, Guyatt et al 2013b). The assumption is unlikely to hold exactly and the results must be regarded as an approximation. The log odds ratio is estimated as

research findings yield

(or approximately 1.81✕SMD). The resulting odds ratio can then be presented as normal, and in a ‘Summary of findings’ table, combined with an assumed comparator group risk to be expressed as an absolute risk difference. The comparator group risk in this case would refer to the proportion of people who have achieved a specific value of the continuous outcome. In randomized trials this can be interpreted as the proportion who have improved by some (specified) amount (responders), for instance by 5 points on a 0 to 100 scale. Table 15.5.c shows some illustrative results from this method. The risk differences can then be converted to NNTs or to people per thousand using methods described in Section 15.4.4 .

Table 15.5.c Risk difference derived for specific SMDs for various given ‘proportions improved’ in the comparator group (Furukawa 1999, Guyatt et al 2013b). Reproduced with permission of Elsevier 

15.5.3.4 Ratio of means

A more frequently used approach is based on calculation of a ratio of means between the intervention and comparator groups (Friedrich et al 2008) as discussed in Chapter 6, Section 6.5.1.3 . Interpretational advantages of this approach include the ability to pool studies with outcomes expressed in different units directly, to avoid the vulnerability of heterogeneous populations that limits approaches that rely on SD units, and for ease of clinical interpretation (row 2, Table 15.5.a and Table 15.5.b ). This method is currently designed for post-intervention scores only. However, it is possible to calculate a ratio of change scores if both intervention and comparator groups change in the same direction in each relevant study, and this ratio may sometimes be informative.

Limitations to this approach include its limited applicability to change scores (since it is unlikely that both intervention and comparator group changes are in the same direction in all studies) and the possibility of misleading results if the comparator group mean is very small, in which case even a modest difference from the intervention group will yield a large and therefore misleading ratio of means. It also requires that separate ratios of means be calculated for each included study, and then entered into a generic inverse variance meta-analysis (see Chapter 10, Section 10.3 ).

The ratio of means approach illustrated in Table 15.5.b suggests a relative reduction in pain of only 13%, meaning that those receiving steroids have a pain severity 87% of those in the comparator group, an effect that might be considered modest.

15.5.3.5 Presenting continuous results as minimally important difference units

To express results in MID units, review authors have two options. First, they can be combined across studies in the same way as the SMD, but instead of dividing the mean difference of each study by its SD, review authors divide by the MID associated with that outcome (Johnston et al 2010, Guyatt et al 2013b). Instead of SD units, the pooled results represent MID units (row 3, Table 15.5.a and Table 15.5.b ), and may be more easily interpretable. This approach avoids the problem of varying SDs across studies that may distort estimates of effect in approaches that rely on the SMD. The approach, however, relies on having well-established MIDs. The approach is also risky in that a difference less than the MID may be interpreted as trivial when a substantial proportion of patients may have achieved an important benefit.

The other approach makes a simple conversion (not shown in Table 15.5.b ), before undertaking the meta-analysis, of the means and SDs from each study to means and SDs on the scale of a particular familiar instrument whose MID is known. For example, one can rescale the mean and SD of other chronic respiratory disease instruments (e.g. rescaling a 0 to 100 score of an instrument) to a the 1 to 7 score in Chronic Respiratory Disease Questionnaire (CRQ) units (by assuming 0 equals 1 and 100 equals 7 on the CRQ). Given the MID of the CRQ of 0.5, a mean difference in change of 0.71 after rescaling of all studies suggests a substantial effect of the intervention (Guyatt et al 2013b). This approach, presenting in units of the most familiar instrument, may be the most desirable when the target audiences have extensive experience with that instrument, particularly if the MID is well established.

15.6 Drawing conclusions

15.6.1 conclusions sections of a cochrane review.

Authors’ conclusions in a Cochrane Review are divided into implications for practice and implications for research. While Cochrane Reviews about interventions can provide meaningful information and guidance for practice, decisions about the desirable and undesirable consequences of healthcare options require evidence and judgements for criteria that most Cochrane Reviews do not provide (Alonso-Coello et al 2016). In describing the implications for practice and the development of recommendations, however, review authors may consider the certainty of the evidence, the balance of benefits and harms, and assumed values and preferences.

15.6.2 Implications for practice

Drawing conclusions about the practical usefulness of an intervention entails making trade-offs, either implicitly or explicitly, between the estimated benefits, harms and the values and preferences. Making such trade-offs, and thus making specific recommendations for an action in a specific context, goes beyond a Cochrane Review and requires additional evidence and informed judgements that most Cochrane Reviews do not provide (Alonso-Coello et al 2016). Such judgements are typically the domain of clinical practice guideline developers for which Cochrane Reviews will provide crucial information (Graham et al 2011, Schünemann et al 2014, Zhang et al 2018a). Thus, authors of Cochrane Reviews should not make recommendations.

If review authors feel compelled to lay out actions that clinicians and patients could take, they should – after describing the certainty of evidence and the balance of benefits and harms – highlight different actions that might be consistent with particular patterns of values and preferences. Other factors that might influence a decision should also be highlighted, including any known factors that would be expected to modify the effects of the intervention, the baseline risk or status of the patient, costs and who bears those costs, and the availability of resources. Review authors should ensure they consider all patient-important outcomes, including those for which limited data may be available. In the context of public health reviews the focus may be on population-important outcomes as the target may be an entire (non-diseased) population and include outcomes that are not measured in the population receiving an intervention (e.g. a reduction of transmission of infections from those receiving an intervention). This process implies a high level of explicitness in judgements about values or preferences attached to different outcomes and the certainty of the related evidence (Zhang et al 2018b, Zhang et al 2018c); this and a full cost-effectiveness analysis is beyond the scope of most Cochrane Reviews (although they might well be used for such analyses; see Chapter 20 ).

A review on the use of anticoagulation in cancer patients to increase survival (Akl et al 2011a) provides an example for laying out clinical implications for situations where there are important trade-offs between desirable and undesirable effects of the intervention: “The decision for a patient with cancer to start heparin therapy for survival benefit should balance the benefits and downsides and integrate the patient’s values and preferences. Patients with a high preference for a potential survival prolongation, limited aversion to potential bleeding, and who do not consider heparin (both UFH or LMWH) therapy a burden may opt to use heparin, while those with aversion to bleeding may not.”

15.6.3 Implications for research

The second category for authors’ conclusions in a Cochrane Review is implications for research. To help people make well-informed decisions about future healthcare research, the ‘Implications for research’ section should comment on the need for further research, and the nature of the further research that would be most desirable. It is helpful to consider the population, intervention, comparison and outcomes that could be addressed, or addressed more effectively in the future, in the context of the certainty of the evidence in the current review (Brown et al 2006):

  • P (Population): diagnosis, disease stage, comorbidity, risk factor, sex, age, ethnic group, specific inclusion or exclusion criteria, clinical setting;
  • I (Intervention): type, frequency, dose, duration, prognostic factor;
  • C (Comparison): placebo, routine care, alternative treatment/management;
  • O (Outcome): which clinical or patient-related outcomes will the researcher need to measure, improve, influence or accomplish? Which methods of measurement should be used?

While Cochrane Review authors will find the PICO domains helpful, the domains of the GRADE certainty framework further support understanding and describing what additional research will improve the certainty in the available evidence. Note that as the certainty of the evidence is likely to vary by outcome, these implications will be specific to certain outcomes in the review. Table 15.6.a shows how review authors may be aided in their interpretation of the body of evidence and drawing conclusions about future research and practice.

Table 15.6.a Implications for research and practice suggested by individual GRADE domains

The review of compression stockings for prevention of deep vein thrombosis (DVT) in airline passengers described in Chapter 14 provides an example where there is some convincing evidence of a benefit of the intervention: “This review shows that the question of the effects on symptomless DVT of wearing versus not wearing compression stockings in the types of people studied in these trials should now be regarded as answered. Further research may be justified to investigate the relative effects of different strengths of stockings or of stockings compared to other preventative strategies. Further randomised trials to address the remaining uncertainty about the effects of wearing versus not wearing compression stockings on outcomes such as death, pulmonary embolism and symptomatic DVT would need to be large.” (Clarke et al 2016).

A review of therapeutic touch for anxiety disorder provides an example of the implications for research when no eligible studies had been found: “This review highlights the need for randomized controlled trials to evaluate the effectiveness of therapeutic touch in reducing anxiety symptoms in people diagnosed with anxiety disorders. Future trials need to be rigorous in design and delivery, with subsequent reporting to include high quality descriptions of all aspects of methodology to enable appraisal and interpretation of results.” (Robinson et al 2007).

15.6.4 Reaching conclusions

A common mistake is to confuse ‘no evidence of an effect’ with ‘evidence of no effect’. When the confidence intervals are too wide (e.g. including no effect), it is wrong to claim that the experimental intervention has ‘no effect’ or is ‘no different’ from the comparator intervention. Review authors may also incorrectly ‘positively’ frame results for some effects but not others. For example, when the effect estimate is positive for a beneficial outcome but confidence intervals are wide, review authors may describe the effect as promising. However, when the effect estimate is negative for an outcome that is considered harmful but the confidence intervals include no effect, review authors report no effect. Another mistake is to frame the conclusion in wishful terms. For example, review authors might write, “there were too few people in the analysis to detect a reduction in mortality” when the included studies showed a reduction or even increase in mortality that was not ‘statistically significant’. One way of avoiding errors such as these is to consider the results blinded; that is, consider how the results would be presented and framed in the conclusions if the direction of the results was reversed. If the confidence interval for the estimate of the difference in the effects of the interventions overlaps with no effect, the analysis is compatible with both a true beneficial effect and a true harmful effect. If one of the possibilities is mentioned in the conclusion, the other possibility should be mentioned as well. Table 15.6.b suggests narrative statements for drawing conclusions based on the effect estimate from the meta-analysis and the certainty of the evidence.

Table 15.6.b Suggested narrative statements for phrasing conclusions

Another common mistake is to reach conclusions that go beyond the evidence. Often this is done implicitly, without referring to the additional information or judgements that are used in reaching conclusions about the implications of a review for practice. Even when additional information and explicit judgements support conclusions about the implications of a review for practice, review authors rarely conduct systematic reviews of the additional information. Furthermore, implications for practice are often dependent on specific circumstances and values that must be taken into consideration. As we have noted, review authors should always be cautious when drawing conclusions about implications for practice and they should not make recommendations.

15.7 Chapter information

Authors: Holger J Schünemann, Gunn E Vist, Julian PT Higgins, Nancy Santesso, Jonathan J Deeks, Paul Glasziou, Elie Akl, Gordon H Guyatt; on behalf of the Cochrane GRADEing Methods Group

Acknowledgements: Andrew Oxman, Jonathan Sterne, Michael Borenstein and Rob Scholten contributed text to earlier versions of this chapter.

Funding: This work was in part supported by funding from the Michael G DeGroote Cochrane Canada Centre and the Ontario Ministry of Health. JJD receives support from the National Institute for Health Research (NIHR) Birmingham Biomedical Research Centre at the University Hospitals Birmingham NHS Foundation Trust and the University of Birmingham. JPTH receives support from the NIHR Biomedical Research Centre at University Hospitals Bristol NHS Foundation Trust and the University of Bristol. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health.

15.8 References

Aguilar MI, Hart R. Oral anticoagulants for preventing stroke in patients with non-valvular atrial fibrillation and no previous history of stroke or transient ischemic attacks. Cochrane Database of Systematic Reviews 2005; 3 : CD001927.

Aguilar MI, Hart R, Pearce LA. Oral anticoagulants versus antiplatelet therapy for preventing stroke in patients with non-valvular atrial fibrillation and no history of stroke or transient ischemic attacks. Cochrane Database of Systematic Reviews 2007; 3 : CD006186.

Akl EA, Gunukula S, Barba M, Yosuico VE, van Doormaal FF, Kuipers S, Middeldorp S, Dickinson HO, Bryant A, Schünemann H. Parenteral anticoagulation in patients with cancer who have no therapeutic or prophylactic indication for anticoagulation. Cochrane Database of Systematic Reviews 2011a; 1 : CD006652.

Akl EA, Oxman AD, Herrin J, Vist GE, Terrenato I, Sperati F, Costiniuk C, Blank D, Schünemann H. Using alternative statistical formats for presenting risks and risk reductions. Cochrane Database of Systematic Reviews 2011b; 3 : CD006776.

Alonso-Coello P, Schünemann HJ, Moberg J, Brignardello-Petersen R, Akl EA, Davoli M, Treweek S, Mustafa RA, Rada G, Rosenbaum S, Morelli A, Guyatt GH, Oxman AD, Group GW. GRADE Evidence to Decision (EtD) frameworks: a systematic and transparent approach to making well informed healthcare choices. 1: Introduction. BMJ 2016; 353 : i2016.

Altman DG. Confidence intervals for the number needed to treat. BMJ 1998; 317 : 1309-1312.

Atkins D, Best D, Briss PA, Eccles M, Falck-Ytter Y, Flottorp S, Guyatt GH, Harbour RT, Haugh MC, Henry D, Hill S, Jaeschke R, Leng G, Liberati A, Magrini N, Mason J, Middleton P, Mrukowicz J, O'Connell D, Oxman AD, Phillips B, Schünemann HJ, Edejer TT, Varonen H, Vist GE, Williams JW, Jr., Zaza S. Grading quality of evidence and strength of recommendations. BMJ 2004; 328 : 1490.

Brown P, Brunnhuber K, Chalkidou K, Chalmers I, Clarke M, Fenton M, Forbes C, Glanville J, Hicks NJ, Moody J, Twaddle S, Timimi H, Young P. How to formulate research recommendations. BMJ 2006; 333 : 804-806.

Cates C. Confidence intervals for the number needed to treat: Pooling numbers needed to treat may not be reliable. BMJ 1999; 318 : 1764-1765.

Clarke MJ, Broderick C, Hopewell S, Juszczak E, Eisinga A. Compression stockings for preventing deep vein thrombosis in airline passengers. Cochrane Database of Systematic Reviews 2016; 9 : CD004002.

Cohen J. Statistical Power Analysis in the Behavioral Sciences . 2nd edition ed. Hillsdale (NJ): Lawrence Erlbaum Associates, Inc.; 1988.

Coleman T, Chamberlain C, Davey MA, Cooper SE, Leonardi-Bee J. Pharmacological interventions for promoting smoking cessation during pregnancy. Cochrane Database of Systematic Reviews 2015; 12 : CD010078.

Dans AM, Dans L, Oxman AD, Robinson V, Acuin J, Tugwell P, Dennis R, Kang D. Assessing equity in clinical practice guidelines. Journal of Clinical Epidemiology 2007; 60 : 540-546.

Friedman LM, Furberg CD, DeMets DL. Fundamentals of Clinical Trials . 2nd edition ed. Littleton (MA): John Wright PSG, Inc.; 1985.

Friedrich JO, Adhikari NK, Beyene J. The ratio of means method as an alternative to mean differences for analyzing continuous outcome variables in meta-analysis: a simulation study. BMC Medical Research Methodology 2008; 8 : 32.

Furukawa T. From effect size into number needed to treat. Lancet 1999; 353 : 1680.

Graham R, Mancher M, Wolman DM, Greenfield S, Steinberg E. Committee on Standards for Developing Trustworthy Clinical Practice Guidelines, Board on Health Care Services: Clinical Practice Guidelines We Can Trust. Washington, DC: National Academies Press; 2011.

Guyatt G, Oxman AD, Akl EA, Kunz R, Vist G, Brozek J, Norris S, Falck-Ytter Y, Glasziou P, DeBeer H, Jaeschke R, Rind D, Meerpohl J, Dahm P, Schünemann HJ. GRADE guidelines: 1. Introduction-GRADE evidence profiles and summary of findings tables. Journal of Clinical Epidemiology 2011a; 64 : 383-394.

Guyatt GH, Juniper EF, Walter SD, Griffith LE, Goldstein RS. Interpreting treatment effects in randomised trials. BMJ 1998; 316 : 690-693.

Guyatt GH, Oxman AD, Vist GE, Kunz R, Falck-Ytter Y, Alonso-Coello P, Schünemann HJ. GRADE: an emerging consensus on rating quality of evidence and strength of recommendations. BMJ 2008; 336 : 924-926.

Guyatt GH, Oxman AD, Kunz R, Woodcock J, Brozek J, Helfand M, Alonso-Coello P, Falck-Ytter Y, Jaeschke R, Vist G, Akl EA, Post PN, Norris S, Meerpohl J, Shukla VK, Nasser M, Schünemann HJ. GRADE guidelines: 8. Rating the quality of evidence--indirectness. Journal of Clinical Epidemiology 2011b; 64 : 1303-1310.

Guyatt GH, Oxman AD, Santesso N, Helfand M, Vist G, Kunz R, Brozek J, Norris S, Meerpohl J, Djulbegovic B, Alonso-Coello P, Post PN, Busse JW, Glasziou P, Christensen R, Schünemann HJ. GRADE guidelines: 12. Preparing summary of findings tables-binary outcomes. Journal of Clinical Epidemiology 2013a; 66 : 158-172.

Guyatt GH, Thorlund K, Oxman AD, Walter SD, Patrick D, Furukawa TA, Johnston BC, Karanicolas P, Akl EA, Vist G, Kunz R, Brozek J, Kupper LL, Martin SL, Meerpohl JJ, Alonso-Coello P, Christensen R, Schünemann HJ. GRADE guidelines: 13. Preparing summary of findings tables and evidence profiles-continuous outcomes. Journal of Clinical Epidemiology 2013b; 66 : 173-183.

Hawe P, Shiell A, Riley T, Gold L. Methods for exploring implementation variation and local context within a cluster randomised community intervention trial. Journal of Epidemiology and Community Health 2004; 58 : 788-793.

Hoffrage U, Lindsey S, Hertwig R, Gigerenzer G. Medicine. Communicating statistical information. Science 2000; 290 : 2261-2262.

Jaeschke R, Singer J, Guyatt GH. Measurement of health status. Ascertaining the minimal clinically important difference. Controlled Clinical Trials 1989; 10 : 407-415.

Johnston B, Thorlund K, Schünemann H, Xie F, Murad M, Montori V, Guyatt G. Improving the interpretation of health-related quality of life evidence in meta-analysis: The application of minimal important difference units. . Health Outcomes and Qualithy of Life 2010; 11 : 116.

Karanicolas PJ, Smith SE, Kanbur B, Davies E, Guyatt GH. The impact of prophylactic dexamethasone on nausea and vomiting after laparoscopic cholecystectomy: a systematic review and meta-analysis. Annals of Surgery 2008; 248 : 751-762.

Lumley J, Oliver SS, Chamberlain C, Oakley L. Interventions for promoting smoking cessation during pregnancy. Cochrane Database of Systematic Reviews 2004; 4 : CD001055.

McQuay HJ, Moore RA. Using numerical results from systematic reviews in clinical practice. Annals of Internal Medicine 1997; 126 : 712-720.

Resnicow K, Cross D, Wynder E. The Know Your Body program: a review of evaluation studies. Bulletin of the New York Academy of Medicine 1993; 70 : 188-207.

Robinson J, Biley FC, Dolk H. Therapeutic touch for anxiety disorders. Cochrane Database of Systematic Reviews 2007; 3 : CD006240.

Rothwell PM. External validity of randomised controlled trials: "to whom do the results of this trial apply?". Lancet 2005; 365 : 82-93.

Santesso N, Carrasco-Labra A, Langendam M, Brignardello-Petersen R, Mustafa RA, Heus P, Lasserson T, Opiyo N, Kunnamo I, Sinclair D, Garner P, Treweek S, Tovey D, Akl EA, Tugwell P, Brozek JL, Guyatt G, Schünemann HJ. Improving GRADE evidence tables part 3: detailed guidance for explanatory footnotes supports creating and understanding GRADE certainty in the evidence judgments. Journal of Clinical Epidemiology 2016; 74 : 28-39.

Schünemann HJ, Puhan M, Goldstein R, Jaeschke R, Guyatt GH. Measurement properties and interpretability of the Chronic respiratory disease questionnaire (CRQ). COPD: Journal of Chronic Obstructive Pulmonary Disease 2005; 2 : 81-89.

Schünemann HJ, Guyatt GH. Commentary--goodbye M(C)ID! Hello MID, where do you come from? Health Services Research 2005; 40 : 593-597.

Schünemann HJ, Fretheim A, Oxman AD. Improving the use of research evidence in guideline development: 13. Applicability, transferability and adaptation. Health Research Policy and Systems 2006; 4 : 25.

Schünemann HJ. Methodological idiosyncracies, frameworks and challenges of non-pharmaceutical and non-technical treatment interventions. Zeitschrift für Evidenz, Fortbildung und Qualität im Gesundheitswesen 2013; 107 : 214-220.

Schünemann HJ, Tugwell P, Reeves BC, Akl EA, Santesso N, Spencer FA, Shea B, Wells G, Helfand M. Non-randomized studies as a source of complementary, sequential or replacement evidence for randomized controlled trials in systematic reviews on the effects of interventions. Research Synthesis Methods 2013; 4 : 49-62.

Schünemann HJ, Wiercioch W, Etxeandia I, Falavigna M, Santesso N, Mustafa R, Ventresca M, Brignardello-Petersen R, Laisaar KT, Kowalski S, Baldeh T, Zhang Y, Raid U, Neumann I, Norris SL, Thornton J, Harbour R, Treweek S, Guyatt G, Alonso-Coello P, Reinap M, Brozek J, Oxman A, Akl EA. Guidelines 2.0: systematic development of a comprehensive checklist for a successful guideline enterprise. CMAJ: Canadian Medical Association Journal 2014; 186 : E123-142.

Schünemann HJ. Interpreting GRADE's levels of certainty or quality of the evidence: GRADE for statisticians, considering review information size or less emphasis on imprecision? Journal of Clinical Epidemiology 2016; 75 : 6-15.

Smeeth L, Haines A, Ebrahim S. Numbers needed to treat derived from meta-analyses--sometimes informative, usually misleading. BMJ 1999; 318 : 1548-1551.

Sun X, Briel M, Busse JW, You JJ, Akl EA, Mejza F, Bala MM, Bassler D, Mertz D, Diaz-Granados N, Vandvik PO, Malaga G, Srinathan SK, Dahm P, Johnston BC, Alonso-Coello P, Hassouneh B, Walter SD, Heels-Ansdell D, Bhatnagar N, Altman DG, Guyatt GH. Credibility of claims of subgroup effects in randomised controlled trials: systematic review. BMJ 2012; 344 : e1553.

Zhang Y, Akl EA, Schünemann HJ. Using systematic reviews in guideline development: the GRADE approach. Research Synthesis Methods 2018a: doi: 10.1002/jrsm.1313.

Zhang Y, Alonso-Coello P, Guyatt GH, Yepes-Nunez JJ, Akl EA, Hazlewood G, Pardo-Hernandez H, Etxeandia-Ikobaltzeta I, Qaseem A, Williams JW, Jr., Tugwell P, Flottorp S, Chang Y, Zhang Y, Mustafa RA, Rojas MX, Schünemann HJ. GRADE Guidelines: 19. Assessing the certainty of evidence in the importance of outcomes or values and preferences-Risk of bias and indirectness. Journal of Clinical Epidemiology 2018b: doi: 10.1016/j.jclinepi.2018.1001.1013.

Zhang Y, Alonso Coello P, Guyatt G, Yepes-Nunez JJ, Akl EA, Hazlewood G, Pardo-Hernandez H, Etxeandia-Ikobaltzeta I, Qaseem A, Williams JW, Jr., Tugwell P, Flottorp S, Chang Y, Zhang Y, Mustafa RA, Rojas MX, Xie F, Schünemann HJ. GRADE Guidelines: 20. Assessing the certainty of evidence in the importance of outcomes or values and preferences - Inconsistency, Imprecision, and other Domains. Journal of Clinical Epidemiology 2018c: doi: 10.1016/j.jclinepi.2018.1005.1011.

For permission to re-use material from the Handbook (either academic or commercial), please see here for full details.

An official website of the United States government

Official websites use .gov A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS A lock ( Lock Locked padlock icon ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

  • Publications
  • Account settings
  • Advanced Search
  • Journal List

Journal of Family Medicine and Primary Care logo

Validity, reliability, and generalizability in qualitative research

Lawrence leung.

  • Author information
  • Copyright and License information

Address for correspondence: Prof. Lawrence Leung, Centre of Studies in Primary Care, Queen's University, 220 Bagot Street, Kingston, ON K7L 5E9, Canada. E-mail: [email protected]

This is an open-access article distributed under the terms of the Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

In general practice, qualitative research contributes as significantly as quantitative research, in particular regarding psycho-social aspects of patient-care, health services provision, policy setting, and health administrations. In contrast to quantitative research, qualitative research as a whole has been constantly critiqued, if not disparaged, by the lack of consensus for assessing its quality and robustness. This article illustrates with five published studies how qualitative research can impact and reshape the discipline of primary care, spiraling out from clinic-based health screening to community-based disease monitoring, evaluation of out-of-hours triage services to provincial psychiatric care pathways model and finally, national legislation of core measures for children's healthcare insurance. Fundamental concepts of validity, reliability, and generalizability as applicable to qualitative research are then addressed with an update on the current views and controversies.

Keywords: Controversies, generalizability, primary care research, qualitative research, reliability, validity

Nature of Qualitative Research versus Quantitative Research

The essence of qualitative research is to make sense of and recognize patterns among words in order to build up a meaningful picture without compromising its richness and dimensionality. Like quantitative research, the qualitative research aims to seek answers for questions of “how, where, when who and why” with a perspective to build a theory or refute an existing theory. Unlike quantitative research which deals primarily with numerical data and their statistical interpretations under a reductionist, logical and strictly objective paradigm, qualitative research handles nonnumerical information and their phenomenological interpretation, which inextricably tie in with human senses and subjectivity. While human emotions and perspectives from both subjects and researchers are considered undesirable biases confounding results in quantitative research, the same elements are considered essential and inevitable, if not treasurable, in qualitative research as they invariable add extra dimensions and colors to enrich the corpus of findings. However, the issue of subjectivity and contextual ramifications has fueled incessant controversies regarding yardsticks for quality and trustworthiness of qualitative research results for healthcare.

Impact of Qualitative Research upon Primary Care

In many ways, qualitative research contributes significantly, if not more so than quantitative research, to the field of primary care at various levels. Five qualitative studies are chosen to illustrate how various methodologies of qualitative research helped in advancing primary healthcare, from novel monitoring of chronic obstructive pulmonary disease (COPD) via mobile-health technology,[ 1 ] informed decision for colorectal cancer screening,[ 2 ] triaging out-of-hours GP services,[ 3 ] evaluating care pathways for community psychiatry[ 4 ] and finally prioritization of healthcare initiatives for legislation purposes at national levels.[ 5 ] With the recent advances of information technology and mobile connecting device, self-monitoring and management of chronic diseases via tele-health technology may seem beneficial to both the patient and healthcare provider. Recruiting COPD patients who were given tele-health devices that monitored lung functions, Williams et al. [ 1 ] conducted phone interviews and analyzed their transcripts via a grounded theory approach, identified themes which enabled them to conclude that such mobile-health setup and application helped to engage patients with better adherence to treatment and overall improvement in mood. Such positive findings were in contrast to previous studies, which opined that elderly patients were often challenged by operating computer tablets,[ 6 ] or, conversing with the tele-health software.[ 7 ] To explore the content of recommendations for colorectal cancer screening given out by family physicians, Wackerbarth, et al. [ 2 ] conducted semi-structure interviews with subsequent content analysis and found that most physicians delivered information to enrich patient knowledge with little regard to patients’ true understanding, ideas, and preferences in the matter. These findings suggested room for improvement for family physicians to better engage their patients in recommending preventative care. Faced with various models of out-of-hours triage services for GP consultations, Egbunike et al. [ 3 ] conducted thematic analysis on semi-structured telephone interviews with patients and doctors in various urban, rural and mixed settings. They found that the efficiency of triage services remained a prime concern from both users and providers, among issues of access to doctors and unfulfilled/mismatched expectations from users, which could arouse dissatisfaction and legal implications. In UK, a care pathways model for community psychiatry had been introduced but its benefits were unclear. Khandaker et al. [ 4 ] hence conducted a qualitative study using semi-structure interviews with medical staff and other stakeholders; adopting a grounded-theory approach, major themes emerged which included improved equality of access, more focused logistics, increased work throughput and better accountability for community psychiatry provided under the care pathway model. Finally, at the US national level, Mangione-Smith et al. [ 5 ] employed a modified Delphi method to gather consensus from a panel of nominators which were recognized experts and stakeholders in their disciplines, and identified a core set of quality measures for children's healthcare under the Medicaid and Children's Health Insurance Program. These core measures were made transparent for public opinion and later passed on for full legislation, hence illustrating the impact of qualitative research upon social welfare and policy improvement.

Overall Criteria for Quality in Qualitative Research

Given the diverse genera and forms of qualitative research, there is no consensus for assessing any piece of qualitative research work. Various approaches have been suggested, the two leading schools of thoughts being the school of Dixon-Woods et al. [ 8 ] which emphasizes on methodology, and that of Lincoln et al. [ 9 ] which stresses the rigor of interpretation of results. By identifying commonalities of qualitative research, Dixon-Woods produced a checklist of questions for assessing clarity and appropriateness of the research question; the description and appropriateness for sampling, data collection and data analysis; levels of support and evidence for claims; coherence between data, interpretation and conclusions, and finally level of contribution of the paper. These criteria foster the 10 questions for the Critical Appraisal Skills Program checklist for qualitative studies.[ 10 ] However, these methodology-weighted criteria may not do justice to qualitative studies that differ in epistemological and philosophical paradigms,[ 11 , 12 ] one classic example will be positivistic versus interpretivistic.[ 13 ] Equally, without a robust methodological layout, rigorous interpretation of results advocated by Lincoln et al. [ 9 ] will not be good either. Meyrick[ 14 ] argued from a different angle and proposed fulfillment of the dual core criteria of “transparency” and “systematicity” for good quality qualitative research. In brief, every step of the research logistics (from theory formation, design of study, sampling, data acquisition and analysis to results and conclusions) has to be validated if it is transparent or systematic enough. In this manner, both the research process and results can be assured of high rigor and robustness.[ 14 ] Finally, Kitto et al. [ 15 ] epitomized six criteria for assessing overall quality of qualitative research: (i) Clarification and justification, (ii) procedural rigor, (iii) sample representativeness, (iv) interpretative rigor, (v) reflexive and evaluative rigor and (vi) transferability/generalizability, which also double as evaluative landmarks for manuscript review to the Medical Journal of Australia. Same for quantitative research, quality for qualitative research can be assessed in terms of validity, reliability, and generalizability.

Validity in qualitative research means “appropriateness” of the tools, processes, and data. Whether the research question is valid for the desired outcome, the choice of methodology is appropriate for answering the research question, the design is valid for the methodology, the sampling and data analysis is appropriate, and finally the results and conclusions are valid for the sample and context. In assessing validity of qualitative research, the challenge can start from the ontology and epistemology of the issue being studied, e.g. the concept of “individual” is seen differently between humanistic and positive psychologists due to differing philosophical perspectives:[ 16 ] Where humanistic psychologists believe “individual” is a product of existential awareness and social interaction, positive psychologists think the “individual” exists side-by-side with formation of any human being. Set off in different pathways, qualitative research regarding the individual's wellbeing will be concluded with varying validity. Choice of methodology must enable detection of findings/phenomena in the appropriate context for it to be valid, with due regard to culturally and contextually variable. For sampling, procedures and methods must be appropriate for the research paradigm and be distinctive between systematic,[ 17 ] purposeful[ 18 ] or theoretical (adaptive) sampling[ 19 , 20 ] where the systematic sampling has no a priori theory, purposeful sampling often has a certain aim or framework and theoretical sampling is molded by the ongoing process of data collection and theory in evolution. For data extraction and analysis, several methods were adopted to enhance validity, including 1 st tier triangulation (of researchers) and 2 nd tier triangulation (of resources and theories),[ 17 , 21 ] well-documented audit trail of materials and processes,[ 22 , 23 , 24 ] multidimensional analysis as concept- or case-orientated[ 25 , 26 ] and respondent verification.[ 21 , 27 ]

Reliability

In quantitative research, reliability refers to exact replicability of the processes and the results. In qualitative research with diverse paradigms, such definition of reliability is challenging and epistemologically counter-intuitive. Hence, the essence of reliability for qualitative research lies with consistency.[ 24 , 28 ] A margin of variability for results is tolerated in qualitative research provided the methodology and epistemological logistics consistently yield data that are ontologically similar but may differ in richness and ambience within similar dimensions. Silverman[ 29 ] proposed five approaches in enhancing the reliability of process and results: Refutational analysis, constant data comparison, comprehensive data use, inclusive of the deviant case and use of tables. As data were extracted from the original sources, researchers must verify their accuracy in terms of form and context with constant comparison,[ 27 ] either alone or with peers (a form of triangulation).[ 30 ] The scope and analysis of data included should be as comprehensive and inclusive with reference to quantitative aspects if possible.[ 30 ] Adopting the Popperian dictum of falsifiability as essence of truth and science, attempted to refute the qualitative data and analytes should be performed to assess reliability.[ 31 ]

Generalizability

Most qualitative research studies, if not all, are meant to study a specific issue or phenomenon in a certain population or ethnic group, of a focused locality in a particular context, hence generalizability of qualitative research findings is usually not an expected attribute. However, with rising trend of knowledge synthesis from qualitative research via meta-synthesis, meta-narrative or meta-ethnography, evaluation of generalizability becomes pertinent. A pragmatic approach to assessing generalizability for qualitative studies is to adopt same criteria for validity: That is, use of systematic sampling, triangulation and constant comparison, proper audit and documentation, and multi-dimensional theory.[ 17 ] However, some researchers espouse the approach of analytical generalization[ 32 ] where one judges the extent to which the findings in one study can be generalized to another under similar theoretical, and the proximal similarity model, where generalizability of one study to another is judged by similarities between the time, place, people and other social contexts.[ 33 ] Thus said, Zimmer[ 34 ] questioned the suitability of meta-synthesis in view of the basic tenets of grounded theory,[ 35 ] phenomenology[ 36 ] and ethnography.[ 37 ] He concluded that any valid meta-synthesis must retain the other two goals of theory development and higher-level abstraction while in search of generalizability, and must be executed as a third level interpretation using Gadamer's concepts of the hermeneutic circle,[ 38 , 39 ] dialogic process[ 38 ] and fusion of horizons.[ 39 ] Finally, Toye et al. [ 40 ] reported the practicality of using “conceptual clarity” and “interpretative rigor” as intuitive criteria for assessing quality in meta-ethnography, which somehow echoed Rolfe's controversial aesthetic theory of research reports.[ 41 ]

Food for Thought

Despite various measures to enhance or ensure quality of qualitative studies, some researchers opined from a purist ontological and epistemological angle that qualitative research is not a unified, but ipso facto diverse field,[ 8 ] hence any attempt to synthesize or appraise different studies under one system is impossible and conceptually wrong. Barbour argued from a philosophical angle that these special measures or “technical fixes” (like purposive sampling, multiple-coding, triangulation, and respondent validation) can never confer the rigor as conceived.[ 11 ] In extremis, Rolfe et al. opined from the field of nursing research, that any set of formal criteria used to judge the quality of qualitative research are futile and without validity, and suggested that any qualitative report should be judged by the form it is written (aesthetic) and not by the contents (epistemic).[ 41 ] Rolfe's novel view is rebutted by Porter,[ 42 ] who argued via logical premises that two of Rolfe's fundamental statements were flawed: (i) “The content of research report is determined by their forms” may not be a fact, and (ii) that research appraisal being “subject to individual judgment based on insight and experience” will mean those without sufficient experience of performing research will be unable to judge adequately – hence an elitist's principle. From a realism standpoint, Porter then proposes multiple and open approaches for validity in qualitative research that incorporate parallel perspectives[ 43 , 44 ] and diversification of meanings.[ 44 ] Any work of qualitative research, when read by the readers, is always a two-way interactive process, such that validity and quality has to be judged by the receiving end too and not by the researcher end alone.

In summary, the three gold criteria of validity, reliability and generalizability apply in principle to assess quality for both quantitative and qualitative research, what differs will be the nature and type of processes that ontologically and epistemologically distinguish between the two.

Source of Support: Nil.

Conflict of Interest: None declared.

  • 1. Williams V, Price J, Hardinge M, Tarassenko L, Farmer A. Using a mobile health application to support self-management in COPD: A qualitative study. Br J Gen Pract. 2014;64:e392–400. doi: 10.3399/bjgp14X680473. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 2. Wackerbarth SB, Tarasenko YN, Joyce JM, Haist SA. Physician colorectal cancer screening recommendations: An examination based on informed decision making. Patient Educ Couns. 2007;66:43–50. doi: 10.1016/j.pec.2006.10.003. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 3. Egbunike JN, Shaw C, Porter A, Button LA, Kinnersley P, Hood K, et al. Streamline triage and manage user expectations: Lessons from a qualitative study of GP out-of-hours services. Br J Gen Pract. 2010;60:e83–97. doi: 10.3399/bjgp10X483490. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 4. Khandaker GM, Gandamaneni PK, Dibben CR, Cherukuru S, Cairns P, Ray MK. Evaluating care pathways for community psychiatry in England: A qualitative study. J Eval Clin Pract. 2013;19:298–303. doi: 10.1111/j.1365-2753.2012.01822.x. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 5. Mangione-Smith R, Schiff J, Dougherty D. Identifying children's health care quality measures for Medicaid and CHIP: An evidence-informed, publicly transparent expert process. Acad Pediatr. 2011;11:S11–21. doi: 10.1016/j.acap.2010.11.003. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 6. Hess R, Santucci A, McTigue K, Fischer G, Kapoor W. Patient difficulty using tablet computers to screen in primary care. J Gen Intern Med. 2008;23:476–80. doi: 10.1007/s11606-007-0500-1. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 7. Sanders C, Rogers A, Bowen R, Bower P, Hirani S, Cartwright M, et al. Exploring barriers to participation and adoption of telehealth and telecare within the Whole System Demonstrator trial: A qualitative study. BMC Health Serv Res. 2012;12:220. doi: 10.1186/1472-6963-12-220. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 8. Dixon-Woods M, Shaw RL, Agarwal S, Smith JA. The problem of appraising qualitative research. Qual Saf Health Care. 2004;13:223–5. doi: 10.1136/qshc.2003.008714. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 9. Lincoln YS, Lynham SA, Guba EG. Vol. 4. Sage Publications; 2011. Paradigmatic controversies, contradictions, and emerging confluences, revisited. The Sage Handbook of Qualitative Research; pp. 97–128. [ Google Scholar ]
  • 10. CASP. CASP Qualitative Checklist: Critical Appraisal Skills Program. 2013. [Last cited on 2015 Mar 01]. Available from: http://www.media.wix.com/ugd/dded87_29c5b002d99342f788c6ac670e49f274.pdf .
  • 11. Barbour RS. Checklists for improving rigour in qualitative research: A case of the tail wagging the dog? BMJ. 2001;322:1115–7. doi: 10.1136/bmj.322.7294.1115. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 12. Popay J, Rogers A, Williams G. Rationale and standards for the systematic review of qualitative literature in health services research. Qual Health Res. 1998;8:341–51. doi: 10.1177/104973239800800305. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 13. Sale JE. How to assess rigour…or not in qualitative papers. J Eval Clin Pract. 2008;14:912–3. doi: 10.1111/j.1365-2753.2008.01093.x. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 14. Meyrick J. What is good qualitative research? A first step towards a comprehensive approach to judging rigour/quality. J Health Psychol. 2006;11:799–808. doi: 10.1177/1359105306066643. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 15. Kitto SC, Chesters J, Grbich C. Quality in qualitative research. Med J Aust. 2008;188:243–6. doi: 10.5694/j.1326-5377.2008.tb01595.x. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 16. Waterman AS. The humanistic psychology-positive psychology divide: Contrasts in philosophical foundations. Am Psychol. 2013;68:124–33. doi: 10.1037/a0032168. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 17. Finfgeld-Connett D. Generalizability and transferability of meta-synthesis research findings. J Adv Nurs. 2010;66:246–54. doi: 10.1111/j.1365-2648.2009.05250.x. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 18. Palinkas LA, Horwitz SM, Green CA, Wisdom JP, Duan N, Hoagwood K. Purposeful Sampling for Qualitative Data Collection and Analysis in Mixed Method Implementation Research. Adm Policy Ment Health. 2013:1–12. doi: 10.1007/s10488-013-0528-y. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 19. Coyne IT. Sampling in qualitative research. Purposeful and theoretical sampling; merging or clear boundaries? J Adv Nurs. 1997;26:623–30. doi: 10.1046/j.1365-2648.1997.t01-25-00999.x. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 20. Becker PH. Common pitfalls in published grounded theory research. Qual Health Res. 1993;3:254–60. [ Google Scholar ]
  • 21. Lincoln YS, Guba EG. Newbury Park, London: Sage Publications; 1985. Naturalistic Inquiry; p. 416. [ Google Scholar ]
  • 22. Rodgers BL, Cowles KV. The qualitative research audit trail: A complex collection of documentation. Res Nurs Health. 1993;16:219–26. doi: 10.1002/nur.4770160309. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 23. Kahn DL. Reducing bias. In: Cohen MZ, Kahn DL, Steeves RH, editors. Hermeneutic Phenomenological Research: A Practical Guide for Nurse Researchers. Newbury Park, London: Sage Publications; 2000. pp. 85–99. [ Google Scholar ]
  • 24. Carcary M. The research audit trail – Enhancing trustworthiness in qualitative inquiry. [Last accessed on 2015 Mar 03];Electron J Bus Res Methods. 2009 7:11–24. Available from: http://www.ejbrm.com . [ Google Scholar ]
  • 25. Jansen H. The logic of qualitative survey research and its position in the field of social research methods. Forum Qual Soc Res. 2010;11:2. [ Google Scholar ]
  • 26. Miles MB, Huberman AM. Newbury Park; London: Sage; 1994. Qualitative Data Analysis: An Expanded Sourcebook. [ Google Scholar ]
  • 27. George M, Apter AJ. Gaining insight into patients’ beliefs using qualitative research methodologies. Curr Opin Allergy Clin Immunol. 2004;4:185–9. doi: 10.1097/00130832-200406000-00008. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 28. Grossoehme DH. Overview of qualitative research. J Health Care Chaplain. 2014;20:109–22. doi: 10.1080/08854726.2014.925660. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 29. Silverman D. 3rd ed. Newbury Park, London: SAGE Publications Ltd; 2009. Doing Qualitative Research; p. 472. [ Google Scholar ]
  • 30. Patton MQ. Enhancing the quality and credibility of qualitative analysis. Health Serv Res. 1999;34:1189–208. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 31. Allmark P. Popper and nursing theory. Nurs Philos. 2003;4:4–16. doi: 10.1046/j.1466-769x.2003.00114.x. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 32. Kvale S, Brinkmann S. Newbury Park, London: Sage; 2009. Interviews: Learning the Craft of Qualitative Research Interviewing. [ Google Scholar ]
  • 33. Trochim WM. Cincinnati, Ohio: Atomic Dog Publishing; 2005. Research Methods: The Concise Knowledge Base. [ Google Scholar ]
  • 34. Zimmer L. Qualitative meta-synthesis: A question of dialoguing with texts. J Adv Nurs. 2006;53:311–8. doi: 10.1111/j.1365-2648.2006.03721.x. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 35. Glaser BG, Strauss AL, Strutzel E. The discovery of grounded theory; strategies for qualitative research. Nurs Res. 1968;17:364. [ Google Scholar ]
  • 36. Van Manen M. New York: SUNY Press: Suny Press; 1990. Researching Lived Experience: Human Science for An Action Sensitive Pedagogy. [ Google Scholar ]
  • 37. Noblit GW, Hare RD. Newbury Park, London: Sage; 1988. Meta-Ethnography: Synthesizing Qualitative Studies. [ Google Scholar ]
  • 38. Gadamer HG. Truth and Method. In: Weinsheimer J, Marshall DG, editors. New York: Continuum; 1989. [ Google Scholar ]
  • 39. Thompson J. Advancing Nursing Science Through Research. Vol. 2. Newbury Park, London: Sage Publications; 1990. Hermeneutic inquiry; pp. 223–86. [ Google Scholar ]
  • 40. Toye F, Seers K, Allcock N, Briggs M, Carr E, Andrews J, et al. Trying to pin down jelly – Exploring intuitive processes in quality assessment for meta-ethnography. BMC Med Res Methodol. 2013;13:46. doi: 10.1186/1471-2288-13-46. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 41. Rolfe G. Validity, trustworthiness and rigour: Quality and the idea of qualitative research. J Adv Nurs. 2006;53:304–10. doi: 10.1111/j.1365-2648.2006.03727.x. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 42. Porter S. Validity, trustworthiness and rigour: Reasserting realism in qualitative research. J Adv Nurs. 2007;60:79–86. doi: 10.1111/j.1365-2648.2007.04360.x. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 43. Guba EG, Lincoln YS. Newbury Park, London: Sage Publications; 1989. Fourth Generationm Evaluation; p. 296. [ Google Scholar ]
  • 44. Sparkes AC. Myth 94: Qualitative health researchers will agree about validity. Qual Health Res. 2001;11:538–52. doi: 10.1177/104973230101100409. [ DOI ] [ PubMed ] [ Google Scholar ]
  • View on publisher site
  • PDF (417.2 KB)
  • Collections

Similar articles

Cited by other articles, links to ncbi databases.

  • Download .nbib .nbib
  • Format: AMA APA MLA NLM

Add to Collections

  • Privacy Policy

Research Method

Home » Research Findings – Types Examples and Writing Guide

Research Findings – Types Examples and Writing Guide

Table of Contents

Research findings are the core results of a study, providing answers to research questions and supporting or refuting hypotheses. They present essential information about what was observed, measured, or discovered during the research process. Effectively writing research findings allows researchers to convey their results in a clear, organized, and credible manner. This guide explores the types of research findings, provides examples, and offers a writing guide to help you present your findings effectively.

Research Findings

Research Findings

Research findings are the conclusions drawn from data analysis, presenting the outcomes of the study based on collected evidence. They offer insights, patterns, and knowledge about the research topic, helping to bridge theory and real-world application.

Types of Research Findings

  • Description : Descriptive findings summarize the data without making interpretations or drawing conclusions. They present statistics or visual data representations like means, medians, percentages, or frequencies.
  • Example : “75% of survey respondents indicated a preference for digital banking over traditional banking methods.”
  • Description : Comparative findings analyze differences or similarities between groups, categories, or conditions. They help researchers understand how variables relate to each other.
  • Example : “Group A, which received the new curriculum, scored 15% higher on the final exam compared to Group B.”
  • Description : These findings show relationships between variables without implying causation. They use statistical measures to determine if variables are related, positively or negatively.
  • Example : “There is a positive correlation (r = 0.62) between hours studied and test scores among high school students.”
  • Description : Causal findings identify cause-and-effect relationships, often determined through controlled experiments. They provide evidence that one variable influences or causes a change in another.
  • Example : “The new drug significantly reduced symptoms in 80% of patients, suggesting it is an effective treatment for the condition.”
  • Description : Inferential findings use statistical analysis to make inferences or predictions about a population based on sample data. They often involve hypothesis testing, confidence intervals, and p-values.
  • Example : “With a 95% confidence interval, the data suggests that the new intervention reduces recovery time by an average of 10 days.”
  • Description : Exploratory findings emerge from studies with no prior hypothesis, often revealing patterns or insights that may lead to further research questions. They are common in qualitative research.
  • Example : “Participants frequently mentioned ‘community support’ as a key factor in overcoming challenges, suggesting a potential area for future study.”

Examples of Research Findings

  • Study : Impact of Online Learning on Student Performance.
  • Finding : “Students who participated in online learning had a 12% higher completion rate compared to those in traditional classes.”
  • Study : Patient Experiences with Telehealth Services.
  • Finding : “Most participants felt that telehealth offered greater convenience and flexibility, although 30% reported concerns about the lack of face-to-face interaction.”
  • Study : Relationship between Social Media Usage and Anxiety.
  • Finding : “A moderate positive correlation (r = 0.45) was observed between daily social media use and self-reported anxiety levels.”
  • Study : Consumer Preferences for Product Packaging.
  • Finding : “60% of respondents preferred eco-friendly packaging over plastic, while only 15% expressed no preference.”
  • Study : Effectiveness of Two Job Training Programs.
  • Finding : “Program A led to a 20% higher employment rate among participants than Program B, indicating a significant difference in outcomes.”

Writing Guide for Research Findings

Writing research findings requires clarity, accuracy, and organization. Here’s a step-by-step guide for structuring and presenting your findings effectively:

Step 1: Begin with a Clear Overview

  • Example : “The study found a positive correlation between daily physical activity and mental health among participants.”

Step 2: Organize Findings by Research Question or Hypothesis

  • Example : For a study on student engagement, organize findings by engagement metrics, academic performance, and satisfaction levels.

Step 3: Use Visual Aids to Enhance Understanding

  • Example : A bar chart comparing average test scores between experimental and control groups.

Step 4: Report Data in a Clear and Concise Manner

  • Example : “The experimental group’s average score was 85, compared to 70 in the control group, indicating a significant improvement.”

Step 5: Include Relevant Statistical Details

  • Example : “The difference between groups was statistically significant (p < 0.05).”

Step 6: Compare Findings to Existing Literature

  • Example : “These results align with previous research by Smith et al. (2020), which found a similar correlation between exercise and mental health.”

Step 7: Interpret Key Findings

  • Example : “The significant improvement in the experimental group suggests that the new curriculum enhances student performance.”

Step 8: Acknowledge Limitations and Unexpected Findings

  • Example : “While the study shows positive results, the small sample size limits generalizability.”

Step 9: Conclude with a Summary of Findings

  • Example : “Overall, the study indicates that telehealth services improve accessibility and convenience, though further research is needed to address the concerns about personal interaction.”

Tips for Writing Research Findings

  • Be Objective : Report findings without inserting personal opinions or biased interpretations.
  • Keep it Concise : Avoid unnecessary detail; focus on the essential results that answer the research questions.
  • Use Consistent Terminology : Use terms consistently to avoid confusing readers, especially if the study includes multiple variables or technical terms.
  • Provide Enough Context : Ensure readers understand the significance of each finding by offering context where needed.
  • Proofread : Ensure all figures, data points, and statistical values are accurate and match the information in your data tables or appendices.

Example of Writing Research Findings

Findings Overview

The study aimed to evaluate the impact of online learning on student engagement and performance. Data was collected from 300 undergraduate students over a semester.

Engagement Metrics

  • Students in online learning sessions participated actively, with 80% reporting higher engagement levels compared to traditional classroom settings.
  • A notable increase in discussion board activity was observed, averaging 10 posts per student per week.

Academic Performance

  • The average final exam score for the online learning group was 82%, compared to 74% in the control group.
  • Statistical analysis revealed a significant difference in performance (t = 2.34, p < 0.05), suggesting online learning positively influenced academic outcomes.

Student Satisfaction

  • 78% of online learners expressed satisfaction with the flexibility of online sessions, though 25% mentioned concerns about reduced instructor interaction.
  • A survey of participants indicated that flexibility was the most valued aspect of online learning (rated 4.5 out of 5).

Limitations

While the results suggest benefits of online learning, the limited sample size and short study duration may restrict generalizability. Further research is recommended to confirm these findings across different institutions.

Writing research findings requires a balance of clarity, accuracy, and conciseness. By organizing data around research questions, using visual aids, and offering thoughtful interpretation, researchers can present findings that communicate valuable insights to readers. This structured approach to writing findings not only enhances readability but also strengthens the credibility and impact of the research.

  • Creswell, J. W., & Creswell, J. D. (2018). Research Design: Qualitative, Quantitative, and Mixed Methods Approaches .
  • Patten, M. L., & Newhart, M. (2017). Understanding Research Methods: An Overview of the Essentials . Routledge.
  • Punch, K. F. (2014). Introduction to Social Research: Quantitative and Qualitative Approaches . Sage Publications.
  • Walliman, N. (2017). Research Methods: The Basics . Routledge.
  • Maxwell, J. A. (2013). Qualitative Research Design: An Interactive Approach . Sage Publications.
  • Yin, R. K. (2017). Case Study Research and Applications: Design and Methods . Sage Publications.
  • Miles, M. B., Huberman, A. M., & Saldaña, J. (2014). Qualitative Data Analysis: A Methods Sourcebook . Sage Publications.
  • Fraenkel, J. R., Wallen, N. E., & Hyun, H. H. (2018). How to Design and Evaluate Research in Education . McGraw-Hill Education.

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

Research Topic

What is Research Topic – Ideas and Examples

Research Paper Conclusion

Research Paper Conclusion – Writing Guide and...

Scope of the Research

Scope of the Research – Writing Guide and...

Research Report

Research Report – Example, Writing Guide and...

What is a Hypothesis

What is a Hypothesis – Types, Examples and...

APA Table of Contents

APA Table of Contents – Format and Example

IMAGES

  1. Yield prediction map generation and analysis.

    research findings yield

  2. | Response of yield. (A) Overall yield effect of all observations

    research findings yield

  3. (a) Comparison of predicted yield strengths with experimental results

    research findings yield

  4. The relationships between the measured yield and the yield predicted

    research findings yield

  5. Reference yield (Y R ) and final yield (Y F ) over the monitored period

    research findings yield

  6. Wade inércia Benigno business process optimization examples vitória

    research findings yield

VIDEO

  1. Radiology

  2. Radiology

  3. Radiology

  4. Radiology

  5. Radiology

  6. Radiology

COMMENTS

  1. Common Pitfalls In The Research Process

    The highest yield preempting of pitfalls in the research process occurs in the planning phase. This is when a researcher can set the stage for an optimal research process. ... significant findings, conducting an updated literature search, writing introduction, methods, results, and discussion sections of a paper, going through the many journal ...

  2. What Is Qualitative Research? An Overview and Guidelines

    Studying consumer behavior in a particular demographic may yield insights that are not applicable elsewhere. The implication is a potential limitation in the applicability of the findings. ... This strategy helps to ensure that the research findings are consistent and can be confirmed across different methods or viewpoints, which, in turn ...

  3. Current Guide

    It can be defined as the extension of research findings and conclusions from a study conducted on a sample population to the population at large. ... and others -- would have to be addressed in order for the study to yield potentially valid results. However, even if virtually all variables were isolated, results of the study would not be 100% ...

  4. Research Findings Guide: Examples, Types, and Structuring Tips

    Discover research findings with examples, types, and tips on structuring findings in research papers. Discover research findings with examples, types, and tips on structuring findings in research papers. ... For example, a study measuring the effects of a new medication on blood pressure would yield primary findings about its effectiveness ...

  5. (PDF) Validity and Reliability in Quantitative Research

    The validity and reliability of the scales used in research are important factors that enable the research to yield healthy results. ... and the CR value (CR = 0.903) is greater than the AVE value ...

  6. Chapter 15: Interpreting results and drawing conclusions

    'Summary of findings' tables are usually supported by full evidence profiles which include the detailed ratings of the evidence (Guyatt et al 2011a, Guyatt et al 2013a, Guyatt et al 2013b, Santesso et al 2016). ... in which case even a modest difference from the intervention group will yield a large and therefore misleading ratio of means ...

  7. Validity, reliability, and generalizability in qualitative research

    Hence, the essence of reliability for qualitative research lies with consistency.[24,28] A margin of variability for results is tolerated in qualitative research provided the methodology and epistemological logistics consistently yield data that are ontologically similar but may differ in richness and ambience within similar dimensions.

  8. Improving the utility of non-significant results for educational

    The results of this review indicate a rather large number of non-significant findings in educational research, many of which are misinterpreted by researchers as indicating evidence for the absence of an effect, or as differing from another significant effect. ... 48% of statistical hypothesis tests in educational research yield a non ...

  9. Bridging the Gap Between Research and Practice: Predicting What Will

    The effectiveness rating rates the strength of evidence supporting the claim that Read 180 positively affected outcomes within a domain (e.g., comprehension) based on "the quality of research, the statistical significance of findings, the magnitude of findings, and the consistency of findings across studies" .

  10. Research Findings

    Finding: "Program A led to a 20% higher employment rate among participants than Program B, indicating a significant difference in outcomes." Writing Guide for Research Findings. Writing research findings requires clarity, accuracy, and organization. Here's a step-by-step guide for structuring and presenting your findings effectively: