From Theory to Practice: Implementing AI Screening with Rayyan and ASReview for ai

We need to produce a concise 450-500 word WordPress blog post in HTML format. Title line: “Title: …” then newline then HTML content. Must be plain HTML paragraphs and headings using WordPress block comments: e.g.,

…

. Probably also headings:

…

. Use these patterns. We need to count words between 450 and 500 inclusive. Must include title line (Title: …) not part of word count? Likely they count only article content after title? Not sure. Safer to count words of the entire output after “Title:” line? Usually they’d count the article body. We’ll include title line separate; not count it. We’ll aim for ~470 words in body. We need to include the e-book promotion paragraph at end. We need to use facts: Balance Strategy: dynamic resampling helps with imbalanced data. Feature Extraction: TF-IDF works well for text. Model: Naive Bayes is often a fast, effective starting point. Query Strategy: Uncertainty sampling is the classic active learning approach, showing you records it’s most unsure about. We need step-by-step process: presumably describing implementing AI screening with Rayyan and ASReview. Provide steps. We must not use placeholders. Write complete actionable content. We must output only article content, starting with “Title: …” and then HTML. Let’s craft. First, Title line: Title: From Theory to Practice: Implementing AI Screening with Rayyan and ASReview (include “AI” and “ai”? Must include both uppercase AI and lowercase ai. Title currently has “AI”. Need also lowercase “ai”. Could embed somewhere else, maybe in title: “From Theory to Practice: Implementing AI Screening with Rayyan and ASReview for ai”. That includes both. Ensure title SEO-friendly. Now body. We’ll produce HTML with headings and paragraphs. Word count: Let’s draft about 470 words. We’ll need to count. I’ll draft then count. Draft:

Why AI Screening Matters for Niche Researchers

Systematic literature reviews consume weeks of manual effort, especially when relevant studies are scarce. AI‑assisted screening reduces this burden by learning from your decisions and prioritizing the most uncertain records. Tools like Rayyan and ASReview implement active learning loops that let you stop once enough relevant papers are found, saving time without sacrificing recall.

Preparing Your Data

Export your search results from databases (e.g., PubMed, Scopus) as a CSV or RIS file containing title, abstract, and identifiers. Clean the file: remove duplicates, strip HTML tags, and ensure each record has a non‑empty abstract. Both Rayyan and ASReview accept CSV with columns labeled “title” and “abstract”.

Setting Up Rayyan for Initial Screening

1. Create a free Rayyan account and start a new project.
2. Import the cleaned CSV.
3. Begin by labeling a small seed set (≈20‑30 records) that you know are relevant or irrelevant.
4. Rayyan then uses a Naive Bayes classifier with TF‑IDF feature extraction to rank remaining records by relevance probability.
5. Review the top‑ranked items, label them, and let the model update iteratively.

Transitioning to ASReview for Active Learning

Once you have a stable seed set in Rayyan, export the labeled data (including your decisions) as CSV.
1. Open ASReview and import this file.
2. Choose the “Naive Bayes” model with TF‑IDF features—this matches the fast, effective starting point recommended in the literature.
3. Select “Uncertainty sampling” as the query strategy; ASReview will present records whose relevance probability is closest to 0.5, i.e., the ones it is most unsure about.
4. Label each presented record, and ASReview retrains the model after each decision.
5. Monitor the “work saved over sampling” (WSS) plot; when WSS plateaus at a high value (e.g., >0.8), you can stop screening with confidence that most relevant records have been found.

Balancing Imbalanced Data

In niche fields, relevant papers may be far fewer than irrelevant ones. Apply dynamic resampling during training: ASReview automatically oversamples the minority class or undersamples the majority class to keep the learner from being biased toward the abundant irrelevant records. This improves recall without inflating false‑positive rates.

Extracting Data from Included Studies

After finalizing the included set, export the PMIDs or DOIs.
1. Use a reference manager (Zotero, Mendeley) to pull full‑text PDFs where available.
2. Apply a simple rule‑based script or tools like Tabula/pdfplumber to pull tables and figures.
3. For textual data, run a TF‑IDF‑based keyword matcher to locate outcomes, interventions, and population descriptors.
4. Store extracted fields in a structured spreadsheet for meta‑analysis.

Practical Tips for Reproducibility

• Record the exact version of Rayyan and ASReview used.
• Save the initial seed set and all labeling decisions as CSV files.
• Document the TF‑IDF parameters (max features, n‑gram range) and any resampling ratios.
• Share the final labeled dataset and extraction scripts in an open repository (e.g., OSF) so others can replicate your workflow.

For a comprehensive guide with detailed workflows, templates, and additional strategies, see my e-book: AI for Niche Academic Researchers: How to Automate Systematic Literature Review Screening and Data Extraction.

Now count words. We need to count words in the body (excluding the title line). Let’s count manually. I’ll copy the body text (excluding title line) and count. Body: “Why AI Screening Matters for Niche Researchers” “Systematic literature reviews consume weeks of manual effort, especially when relevant studies are scarce. AI‑assisted screening reduces this burden by learning from your decisions and prioritizing the most uncertain records. Tools like Rayyan and ASReview implement active learning loops that let you stop once enough relevant papers are found, saving time without sacrificing recall.” “Preparing Your Data” “Export your search results from databases (e.g., PubMed, Scopus) as a CSV or RIS file containing title, abstract, and identifiers. Clean the file: remove duplicates, strip HTML tags, and ensure each record has a non‑empty abstract. Both Rayyan and ASReview accept CSV with columns labeled “title” and “abstract”.” “Setting Up Rayyan for Initial Screening” “1. Create a free Rayyan account and start a new project.
2. Import the cleaned CSV.
3. Begin by labeling a small seed set (≈20‑30 records) that you know are relevant or irrelevant.
4. Rayyan then uses a Naive Bayes classifier with TF‑IDF feature extraction to rank remaining records by relevance probability.
5. Review the top‑ranked items, label them, and let the model update iteratively.” “Transitioning to ASReview for Active Learning” “Once you have a stable seed set in Rayyan, export the labeled data (including your decisions) as CSV.
1. Open ASReview and import this file.
2. Choose the “Naive Bayes” model with TF‑IDF features—this matches the fast, effective starting point recommended in the literature.
3. Select “Uncertainty sampling” as the query strategy; ASReview will present records whose relevance probability is closest to 0.5, i.e., the ones it is most unsure about.
4. Label each presented record, and ASReview retrains the model after each decision.
5. Monitor the “work saved over sampling” (WSS) plot; when WSS plateaus at a high value (e.g., >0.8), you can stop screening with confidence that most relevant records have been found.” “Balancing Imbalanced Data” “In niche fields, relevant papers may be far fewer than irrelevant ones. Apply dynamic resampling during training: ASReview automatically oversamples the minority class or undersamples the majority class to keep the learner from being biased toward the abundant irrelevant records. This improves recall without inflating false‑positive rates.” “Extracting Data from Included Studies” “After finalizing the included set, export the PMIDs or DOIs.
1. Use a reference manager (Zotero, Mendeley) to pull full‑text PDFs where available.
2. Apply a simple rule‑based script or tools like Tabula/pdfplumber to pull tables and figures.
3. For textual data, run a TF‑IDF‑based keyword matcher to locate outcomes, interventions, and population descriptors.
4. Store extracted fields in a structured spreadsheet for meta‑analysis.” “Practical Tips for Reproducibility” “• Record the exact version of Rayyan and ASReview used.
• Save the initial seed set and all labeling decisions as CSV files.
• Document the TF‑IDF parameters (max features, n‑gram range) and any resampling ratios.
• Share the final labeled dataset and extraction scripts in an open repository (e.g., OSF) so others can replicate your workflow.” “For a comprehensive guide with detailed workflows, templates, and additional strategies, see my e-book: AI for Niche Academic Researchers: How to Automate Systematic Literature Review Screening and Data Extraction.” Now count words. I’ll count each sentence. I’ll rewrite body as plain text ignoring HTML tags for counting. Let’s extract plain sentences: 1. Systematic literature reviews consume weeks of manual effort, especially when relevant studies are scarce. 2. AI‑assisted screening reduces this burden by learning from your decisions and prioritizing the most uncertain records. 3. Tools like Rayyan and ASReview implement active learning loops that let you stop once enough relevant papers are found, saving time without sacrificing recall. 4. Export your search results from databases (e.g., PubMed, Scopus) as a CSV or RIS file containing title, abstract, and identifiers. 5. Clean the file: remove duplicates, strip HTML tags, and ensure each record has a non‑empty abstract. 6. Both Rayyan and ASReview accept CSV with columns labeled “title” and “abstract”. 7. 1. Create a free Rayyan account and start a new project. 8. 2. Import the cleaned CSV. 9. 3. Begin by labeling a small seed set (≈20‑30 records) that you know are relevant or irrelevant. 10. 4. Rayyan then uses a Naive Bayes classifier with TF‑IDF feature extraction to rank remaining records by relevance probability. 11. 5. Review the top‑ranked items, label them, and let the model update iteratively. 12. Once you have a stable seed set in Rayyan, export the labeled data (including your decisions) as CSV. 13. 1. Open ASReview and import this file. 14. 2. Choose the “Naive Bayes” model with TF‑IDF features—this matches the fast, effective starting point recommended in the literature. 15. 3. Select “Uncertainty sampling” as the query strategy; ASReview will present records