Preparation - Combining lists and removing duplicates
Before the initial screening, the search results from multiple databases need to be combined. There is high possibility that the combined list to consist of duplicates. These duplicates need to be eliminated by keeping only one record of each article. Systematic literature review tools such as Cofidence and Rayyan can automatically detect duplicates using AI, but they are not free. We can also detect duplicated documents using Microsoft Excel, utilizing sorting and conditional formatting features.
Screening practice
When the list is ready, team members need to sit in a meeting to agree on how to screen the abstract and do some screening practices by screening several papers together to have a shared understanding of this process. The team leader needs to facilitate discussion to achieve this common understanding. I prefer online meeting through video conference application so that the meeting can be conducted in relax atmosphere outside working hours. Also, this type of meeting is best when team members are from different institutions. As a suggestion, inclusion and exclusion criteria should always be made handy to help with the abstract screening. While article titles and keywords might help in detecting relevancy of articles, abstract must always be prioritized.
Cycles of screening to achieve reliability
In this process, it is important to objectively prove that all team members have the same understanding of the inclusion and exclusion criteria. To determine whether the teams members have adequate agrement, each member needs to screen at least 30 same articles independently. After completing this first cycle of the screening, the screening results of each member can be compared using Cohen’s kappa (for two raters) or Fleiss’ kappa (for more than two raters). See Cole (2024) for further reading about reliability in qualitative research. I this book, I will demonstrate how both kappas are calculated manually and in R, the statistical software that I am familiar with. You can explore how it is done in the application of your choice if you prefer. We will use this data for the calculation.
Abstract screening
After adequate kappa coefficient is met, the rest of the papers are divided equally among the team members to be coded individually. At this stage, it is advisable to have three rating categories, i.e., Included, Excluded, and Not sure. The articles rated as Not sure can be discussed in weekly project meeting to make the decision together. This adds to the rebustness of the process. Memember to always refer to the inclusion and exclusion criteria when making the decision during the independent screening or team meeting screening.
References
Cole, R. (2024). Inter-rater reliability methods in qualitative case study research. Sociological Methods & Research, 53(4), 1944–1975. https://doi.org/10.1177/00491241231156971
Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159–174. https://doi.org/10.2307/2529310