Pages

Friday, 19 July 2024

STORM: LLMs Powering AI to Generate Well-Structured Articles

Presentational View

Introduction

Writing organized long-form articles that capture diverse topics and viewpoints has become easily feasible with the advent of Large Language Models (LLMs), such as GPT-2/3. Today, these AI models are trained over huge datasets and they generate text which is not only coherent but also relevant to the context. This is what has made them so invaluable in the whole content creation field. So, the progress made with LLMs has been a big step forward in better and more efficient writing. But there are still hurdles to clear. They do so in three ways: by guarding for facts, consistency over large bodies of text and overall balance. To address these challenges, a new AI model called STORM was released using the latest advanced methodologies in AI along with natural language processing. 

Who Developed STORM?

STORM is the brainchild of a team of researchers at Stanford University, specifically within the Stanford Open Virtual Assistant Lab (OVAL). The development of STORM was supported by contributions from various experts in the field of AI and natural language processing. The primary objective behind the development of STORM was to create an AI system capable of assisting in the creation of comprehensive, well-cited articles. This aligns with the goal of bridging the gap between technical capabilities and practical applications.

What is STORM?

STORM, an acronym for Synthesis of Topic Outlines through Retrieval and Multi-perspective Question Asking. It is an open-source AI tool that is designed to transform topics into comprehensive articles. STORM automates the knowledge curation process, thereby simplifying the generation of lengthy, well-cited reports.

Key Features of STORM

STORM boasts several unique features that set it apart from other LLMs:

  • Perspective-Guided Question Asking: This feature enables STORM to discover different perspectives by surveying existing articles on similar topics, thereby enhancing the depth and breadth of the questions it asks.
  • Simulated Conversation: STORM simulates a conversation between a Wikipedia writer and a topic expert, grounded in internet sources, to update its understanding of the topic and ask follow-up questions.
  • VectorRM Support: STORM supports grounding on user-provided documents, complementing existing support of search engines like YouRM and BingSearch.
  • Customizable API: The model provides an API to support customization of different language models and retrieval/search integration.

Capabilities/Use Cases of STORM

Since STORM has a number of unique features and functionalities, it can be used in many different applications, including:

  • Topic Curation: It can generate long-form articles completely from scratch, making it great for creating Wikipedia-like entries.
  • Research Assistance: STORM can grab information from various sources and synthesize it, offering significant help in research.
  • Teaching: The model can help set up educational materials by including elaborate explanations and references.

How does STORM work?/ Workflow

STORM employs Large Language Models (LLMs) to simulate the human writing process, with a focus on effective question asking and information gathering. The core functionality of STORM lies in its unique approach to researching topics. As depicted in figure below, the process commences with the surveying of related Wikipedia articles to identify various perspectives on the given topic. This multi-perspective approach enables STORM to generate more comprehensive and insightful questions about the subject matter.

The overview of STORM
source - https://arxiv.org/pdf/2402.14207

The system then simulates conversations between a Wikipedia writer, guided by these perspectives, and a topic expert grounded in trustworthy online sources. This iterative question-asking and answering process aids in gathering diverse and relevant information about the topic. After collecting information through simulated conversations, STORM leverages the LLM’s internal knowledge to generate a draft outline. This outline is then refined using the gathered information from different perspectives.

The resulting outline serves as a foundation for producing a full-length article, with each section being generated based on relevant references collected during the research phase. STORM’s approach not only enhances the breadth and depth of the generated articles but also addresses challenges in synthesizing information from multiple sources and planning coherent long-form content. This unique workflow of STORM makes it a powerful tool in the realm of automated content creation.

Performance Evaluation 

The performance evaluation of STORM involved several comprehensive experiments and analyses. One key evaluation focused on outline quality, which serves as a proxy for assessing the pre-writing stage. As shown in table be, STORM outperformed baselines like Direct Gen and RAG in terms of heading soft recall and heading entity recall. This demonstrates STORM's superior ability to create comprehensive outlines that cover more topic-specific aspects through its effective question-asking mechanism.


source - https://arxiv.org/pdf/2402.14207

Another crucial evaluation examined the full-length article quality. Table below presents the results of this assessment, comparing STORM with baselines like Direct Gen, RAG, and oRAG (outline-driven RAG). STORM consistently outperformed these baselines across multiple metrics, including ROUGE scores, entity recall, and rubric grading on aspects like interest level, organization, relevance, and coverage. Notably, STORM achieved significantly higher scores in entity recall and certain rubric-graded aspects, indicating its effectiveness in producing more comprehensive and well-structured articles.

Results of automatic article quality evaluation.
source - https://arxiv.org/pdf/2402.14207

Additional experiments were conducted to evaluate citation quality, perform ablation studies, and assess the impact of the outline stage. The citation quality evaluation showed that a high percentage of sentences generated by STORM were supported by their citations. Ablation studies demonstrated the importance of STORM's perspective-guided question asking and multi-turn conversation simulation in producing high-quality outlines and discovering diverse sources. Furthermore, the evaluation confirmed the necessity of the outline stage in STORM's pipeline, as removing it significantly deteriorated performance across all metrics. These comprehensive evaluations underscore STORM's effectiveness in automating the pre-writing stage and generating high-quality Wikipedia-like articles from scratch.

Techniques and Methods Utilized by STORM

STORM utilizes several AI and machine learning techniques in its workflow. Here's a list of the key methods employed:

  1. Large Language Models (LLMs): STORM heavily relies on LLMs for various tasks throughout its pipeline, including question generation, answer synthesis, and article writing.
  2. Zero-shot Prompting: The system uses zero-shot prompting with LLMs to perform tasks without fine-tuning on specific datasets.
  3. Retrieval-Augmented Generation (RAG): STORM incorporates RAG techniques to ground its responses in external information sources.
  4. Named Entity Recognition (NER): The system employs NER, specifically using FLAIR, to extract and analyze entities in generated content and human-written articles.
  5. Sentence Embedding: STORM uses Sentence-BERT for calculating semantic similarity between texts, which is crucial for information retrieval and evaluation.
  6. Multi-agent Simulation: The system simulates conversations between different agents (a Wikipedia writer and a topic expert) to gather information.
  7. Perspective-guided Generation: STORM uses different perspectives to guide its question generation process, enhancing the breadth of information gathered.
  8. Iterative Information Gathering: The system employs an iterative approach to asking questions and gathering information, refining its understanding of the topic over multiple rounds.
  9. Automatic Evaluation Metrics: STORM uses various NLP evaluation metrics like ROUGE scores and custom metrics like heading soft recall for assessing generated content.
  10. Rule-based Filtering: The system incorporates rule-based filters to exclude untrustworthy sources based on Wikipedia guidelines.

The system's design demonstrates a sophisticated combination of various AI and NLP techniques to tackle the complex task of automated article generation.

How to Access and Use STORM?

STORM is an open-source project and its codebase is available on GitHub. You can clone the repository and follow the instructions provided in the README to set up and use the model locally. The project also has a dedicated website where you can find more information about the project and its latest updates.

If you would like to read more details about this AI model, the sources are all included at the end of this article in the 'source' section.

Limitations and Future directions

STORM faces several limitations that highlight areas for future work. The system's output still falls short of well-revised human-authored articles, particularly in terms of neutrality and verifiability. Despite efforts to incorporate diverse perspectives, the collected information may still be biased towards dominant internet sources. STORM also struggles with more nuanced verifiability issues, sometimes creating unverifiable connections between information pieces. Additionally, the current implementation is limited to free-form text generation, lacking the ability to produce structured data and multi-modal content typical of high-quality Wikipedia articles. 

Future research directions include developing more sophisticated methods for reducing retrieval bias, improving neutrality, enhancing verifiability beyond basic fact-checking, and extending the system's capabilities to handle structured and multi-modal content generation. Addressing these challenges could significantly advance automatic expository writing and narrow the gap between AI-generated and expert-written content.

Conclusion

STORM represents a significant step forward in the use of LLMs for content creation. By automating the knowledge curation process and enhancing the depth and breadth of generated content, STORM bridges the gap between technical capabilities and practical applications. As AI continues to evolve, tools like STORM will play a crucial role in transforming how we create and consume information.

Source
Research Paper: https://arxiv.org/abs/2402.14207
Research document: https://arxiv.org/pdf/2402.14207
GitHub Repo: https://github.com/stanford-oval/storm
Project Details: https://storm-project.stanford.edu/
WebSite: https://storm.genie.stanford.edu/

No comments:

Post a Comment

DeepSeek-R1: Enhanced Reasoning via Reinforcement Learning

Introduction The artificial intelligence field is pushing machines to achieve new capabilities. Its most sought-after advancement is when AI...