Case Study: Scientific Reports Review
Deconstructing the transition from "Prompter" to "Steward" in a real-world manuscript review process.
This case study accompanies The Steward's Schema policy guide.
Context โ
The author was invited to review a manuscript for Nature Scientific Reports. The manuscript addressed the development of custom agentic AI workflows for biomedical data extraction. Before beginning the review, the reviewer consulted the journal's AI policy for peer reviewers, which contains two operative statements:
Statement 1: "We ask that...peer reviewers do not upload manuscripts into generative AI tools."
Statement 2: "If any part of the evaluation of the claims made in the manuscript was in any way supported by an AI tool, we ask peer reviewers to declare the use of such tools transparently in the peer review report."
What the Reviewer Did โ
Step 1: Reference Verification โ
Before reading the manuscript, the reviewer extracted the reference list and submitted it to Perplexity to verify that each citation existed and to obtain a brief summary of each paper's content.
All references were confirmed to be published works โ none were personal communications, preprints, or otherwise non-public. This process later revealed that one cited paper, published in 2020, had no relevance to the large language model topic for which the authors cited it. The reviewer flagged this in the review.
Step 2: Reading and Dictation โ
The reviewer opened the manuscript on one side of the screen and Obsidian on the other. While reading the manuscript from beginning to end, the reviewer dictated reflections, questions, and critiques into Obsidian using voice-to-text. All evaluative content originated from the reviewer's own expertise and judgment.
Step 3: Publicly Available Background Research โ
The reviewer used AI search tools to investigate publicly available questions relevant to evaluating the manuscript's contribution:
- What were the generally accepted practices for gold-standard evaluation of LLM-based data extraction systems at the time the manuscript was prepared?
- When did specific publicly available tools (e.g., NotebookLM) release features comparable to the custom workflows described in the manuscript?
No manuscript content was entered into the AI tool for these queries. The reviewer was seeking contextual knowledge to inform an independent assessment of the work's novelty and rigor.
Step 4: Dictation Cleanup โ
The reviewer's dictated notes were submitted to an AI tool with the explicit instruction: structure this dictation into paragraphs, correct grammar and punctuation, and do not add any new ideas or evaluative content. The AI performed editorial formatting only.
Step 5: Disclosure to the Editor โ
In the confidential comments to the editor, the reviewer disclosed the full process: which AI tools were used, for what purposes, that the manuscript itself was not uploaded, and that the reviewer assumed 100% accountability for the content and conclusions of the review.
Policy Analysis โ
Under Nature Scientific Reports โ
| Action | Statement 1 (Don't upload) | Statement 2 (Disclose AI support) |
|---|---|---|
| Reference list to Perplexity | Gray area โ citation metadata is public, but the list is part of the manuscript | Disclosed |
| Dictation (no AI involved) | Compliant โ no upload | N/A |
| Public background queries | Compliant โ no manuscript content entered AI | N/A |
| Dictation cleanup via AI | Compliant โ reviewer's own text, not the manuscript | Disclosed |
| Disclosure to editor | N/A | Fully compliant |
The reviewer's process was consistent with the spirit of the policy. The reference-list upload is the only action that touches Statement 1, and it does so on narrow grounds: the content uploaded (citation metadata) is entirely publicly available. The disclosure to the editor addressed Statement 2 in full.
Under NIH Grant Review (NOT-OD-23-149) โ
The same actions evaluated under NIH's stricter framework yield a different result.
Reference verification (bulk upload): Problematic. Uploading the formatted reference list shares a structured component of the application with a third-party AI system. Individual queries about specific published papers ("What is the Smith et al. 2022 paper about?") would be defensible, because each query contains only publicly available information and reveals nothing specific to the application. The bulk upload does not have that clean separation.
Dictation cleanup: Problematic. The reviewer's dictated notes inevitably contained paraphrases and direct references to confidential application content โ specific claims, methods, findings. Even though the AI performed only editorial formatting, confidential content transited through a third-party system. NIH's framework does not distinguish between AI as analyst and AI as transcription service once confidential content is involved.
Public background queries: Defensible. No application content entered the AI tool. The reviewer queried publicly available knowledge to inform their own judgment. NSF explicitly permits this ("reviewers may share publicly available information with current generation generative AI tools"). NIH's prohibition is scoped to using AI to "analyze or formulate critiques," which does not encompass independent background research on published work โ though NIH has not carved out this exception explicitly.
Disclosure: NIH's framework does not include a disclosure pathway for reviewers. The prohibition is absolute. Disclosure does not cure a violation.
The Key Lesson โ
The same set of thoughtful, well-intentioned actions can be compliant under one policy framework and problematic under another. Researchers who review for journals and serve on federal study sections must understand that they operate under different rules in each context.
Principles for Practice โ
Self-Test
If you're not willing to disclose your use, you shouldn't use it. This is the most reliable self-test available. Before using any AI tool in a review context, ask: "Would I be comfortable describing exactly what I did to the editor or program officer?" If the answer gives you pause, that pause is informative.
Documentation
If you can't document it, you can't defend it. Save prompts, export chat logs, and maintain version-controlled drafts. If your process is ever questioned, documentation is the difference between an assertion and evidence.
Governance
Know which framework governs your review. Journal policies vary in stringency and may include disclosure pathways. Federal grant review policies are strict prohibitions with no disclosure safe harbor. Check the specific policy before you begin, not after.
Conservative Read
When the policy is ambiguous, default to the conservative read. Policies were not written with the granularity that actual practice requires. When you encounter a gray area โ and you will โ the defensible choice is the more restrictive interpretation.
The Steward's Schema was prepared for the benefit of faculty, staff, and trainees in biomedical research. AI tools โ including Google NotebookLM, Google Deep Research, Perplexity Pro, and Claude Opus 4.6 โ were used to assist in the research, fact-checking, and preparation of this document. All content was reviewed, verified, and edited by the human author. Identifying details of the reviewed manuscript have been omitted to preserve confidentiality.