STORM from Stanford University: Can AI Write Better Wikipedia Articles Than Humans?

Topic

formal sciences

Frequently Asked Questions (FAQ)

What are the challenges of writing long-form articles from scratch using LLMs? Answer: LLMs have shown promise in generating text, but crafting factual and well-structured long-form articles, comparable to Wikipedia entries, presents unique challenges. Pre-writing Stage: Gathering and structuring relevant information before writing is crucial. Simple prompting often yields superficial outlines or factual inaccuracies. Grounded Information: LLMs require access to reliable external sources to avoid factual errors and hallucinations. Efficiently researching and integrating this information is vital. Outline Depth and Breadth: Creating an outline that captures both the main themes and crucial subtopics is essential for a comprehensive article.
How does STORM address the pre-writing challenges? Answer: STORM (Synthesis of Topic Outlines through Retrieval and Multi-perspective Question Asking) utilizes a multi-stage approach: Perspective Discovery: STORM analyzes similar Wikipedia articles to identify diverse perspectives on the topic. Simulated Conversations: It simulates conversations between writers with specific perspectives, prompting them to ask in-depth questions about the topic. These questions are grounded by searching for answers from trustworthy internet sources. Outline Creation: Based on the simulated conversations and the LLM’s internal knowledge, STORM generates a detailed outline that guides article writing.
What is the FreshWiki dataset and why was it created? Answer: FreshWiki is a dataset of high-quality Wikipedia articles created after the training cutoff of the tested LLMs. Its purpose is to: Mitigate Data Leakage: Using recent articles avoids the risk of LLMs simply regurgitating memorized content from their training data. Focus on Research and Curation: The dataset emphasizes the systems ability to research a topic from scratch and curate information, reflecting the human writing process.
How does STORM compare to other LLM-based baselines? Answer: STORM outperforms baselines like Direct Gen, RAG, and Outline-Driven RAG in automatic and human evaluations: Improved Outline Quality: STORM generates outlines with significantly higher heading and entity recall, demonstrating a better understanding of the topic. Enhanced Article Quality: Articles written using STORM’s outlines receive higher scores in terms of organization, coverage, and relevance.
How does STORM use ‘perspectives’ to improve question-asking? Answer: Different stakeholders have distinct priorities. Similarily, individuals with varied perspectives focus on diverse facets of a topic, leading to more comprehensive research. Multifaceted Information: By prompting LLMs to embody different perspectives, STORM generates a broader range of in-depth questions than relying on generic prompts. In-depth Questions: STORM’s simulated conversations uncover detailed information by prompting follow-up questions based on the retrieved answers.
What are the limitations of STORM and similar LLM-based writing systems? Answer: Despite advancements, LLM-generated articles often fall short of human-written articles, particularly in neutrality and verifiability: Bias Transfer: Information retrieved from the internet can introduce bias, especially if dominant sources lack neutrality. Over-association of Facts: LLMs may incorrectly link unrelated facts based on co-occurrence within the context window, impacting verifiability. Simplified Output: Current systems primarily generate text, while high-quality Wikipedia articles often include structured data and multimedia elements.
What did human evaluation by experienced Wikipedia editors reveal? Answer: Evaluation by Wikipedia editors showed: Positive Feedback: Editors found STORM helpful, particularly for the pre-writing stage of collecting sources and creating outlines. Areas for Improvement: Feedback highlighted the need to address issues like bias in sourced information, improper inferential linking, and handling of time-sensitive information.

Significance

Understanding these findings helps advance our knowledge and inform better decisions. This research represents an important contribution to the field. For the full details, watch the video above and explore the linked resources.

Resources & Further Watching

Read the research paper written by Yijia Shao, Yucheng Jiang, Theodore A. Kanell, Peter Xu, Omar Khattab and Monica S. Lam: https://arxiv.org/pdf/2402.14207
STORM website: https://storm.genie.stanford.edu
STORM github: https://github.com/stanford-oval/storm

💡 Please don’t forget to like, comment, share, and subscribe!

Youtube Hashtags

#aiwriting #aiwritingtools #stanforduniversity #wikipedia

Youtube Keywords

storm from stanford university can ai write better wikipedia articles than humans

ResearchLounge

https://researchlounge.org/formal-sciences/computer-science/storm-from-stanford-university-can-ai-write-better-wikipedia-articles-than-humans/