Pages

Wednesday, 3 January 2024

Exploring TaskWeaver: Microsoft’s Innovative Approach to Agent Frameworks

Introduction

TaskWeaver is a code-first agent framework developed by Microsoft. The team behind this innovative model includes many researchers from Microsoft. The primary motivation behind the development of TaskWeaver was to address the limitations of existing Large Language Models (LLMs) in handling domain-specific data analytics tasks with rich data structures. The main goal was to enable developers to create agents that can handle multiple subtasks, coordinate with other agents, and adapt to changing environments using code as the primary interface. TaskWeaver aims to bridge the gap between high-level task specifications and low-level agent implementations, and to provide a flexible and modular platform for agent development and evaluation.

What is TaskWeaver?

TaskWeaver is a cutting-edge framework that interprets user requests through coded snippets and efficiently coordinates a variety of plugins in the form of functions to execute data analytics tasks. It is designed to convert user requests into executable code, treating user-defined plugins as callable functions.

Key Features of TaskWeaver

TaskWeaver offers several unique features:

  • Code-first: TaskWeaver allows developers to specify tasks using code, giving them full control over the task logic, environment dynamics, and reward function. This also facilitates the reuse and modification of existing tasks, and the integration of external libraries and APIs.
  • Rich data structure: It allows you to work with rich data structures in Python, such as DataFrames, instead of dealing with strings.
  • Customized algorithms: It allows you to encapsulate your own algorithms into plugins and orchestrate them.
  • Incorporating domain-specific knowledge: It is designed to incorporate domain-specific knowledge easily to improve the reliability.
  • Stateful execution: It supports stateful execution of the generated code to ensure a consistent and smooth user experience.
  • Code verification: It verifies the generated code before execution and can detect potential issues in the generated code and provide suggestions to fix them.

Capabilities/Use Case of TaskWeaver

TaskWeaver can be used to create agents that can perform various tasks, such as:

  • Playing Atari games: TaskWeaver can generate agents that can play classic Atari games, such as Breakout, Pong, and Space Invaders, using only pixel inputs and joystick actions. The agents can learn from their own experience, from demonstrations, and from self-play, and can achieve human-level or superhuman performance on some games.
  • Solving Sudoku puzzles: TaskWeaver can generate agents that can solve Sudoku puzzles, using only the grid as an input and the digits as actions. The agents can learn from examples, from hints, and from self-correction, and can solve puzzles of different sizes and difficulties.
  • Navigating mazes: TaskWeaver can generate agents that can navigate mazes, using only the local view as an input and the directions as actions. The agents can learn from rewards, from exploration, and from memory, and can find the shortest path to the goal in different mazes.
  • Generating captions: TaskWeaver can generate agents that can generate captions for images, using only the image as an input and the words as actions. The agents can learn from scores, from feedback, and from imitation, and can produce relevant and fluent captions for different images.

How does TaskWeaver Work?

As shown in below figure, TaskWeaver operates through a combination of three key components: the Planner, Code Generator (CG), and Code Executor (CE), all of which are part of the Code Interpreter (CI). The Planner is the system’s entry point and interacts with the user. It breaks down the user’s request into subtasks and manages the execution process with self-reflection. It also transforms the execution result into a response that the user can understand.

The overview of TaskWeaver

source - https://arxiv.org/pdf/2311.17541.pdf

The CG generates code for a given subtask from the Planner, taking into account existing plugins to enable the generated code to incorporate function calls for specific tasks. The examples within the CG guide it, especially for domain-specific tasks unfamiliar to the Large Language Model (LLM). The CE is responsible for executing the generated code and maintaining the execution state throughout the entire session.

The workflow of TaskWeaver is illustrated (in below figure) with an example of pulling data from a database and applying a custom anomaly detection algorithm to the data. The initial step involves the Planner taking the user query, CI description, and, if provided, planning examples to generate a plan. The CI description outlines its code generation and execution capabilities. To enhance the Planner’s effectiveness in task planning, the description includes details of the available plugins.

Workflow of TaskWeaver

source - https://arxiv.org/pdf/2311.17541.pdf

The output of the Planner is a step-by-step plan, according to which the Planner phrases the queries and communicates with the CI. The first step consists of pulling data from the database and describing the data schema. The CG prompt provides comprehensive definitions of all the relevant plugins, including the function name, its description, the arguments it accepts, and what it returns. Code generation examples may be incorporated into the prompt to steer the code generation process. The output from the CG is a code snippet that executes the sql_pull_data plugin, retrieves the data into a DataFrame, and provides a description of the data schema.

The CE’s execution result is sent back to the Planner to determine the next step in the plan. In practice, the Planner may modify its original plan if the outcome differs from expectations. For the next step, the Planner can either confirm with the user if these columns correspond to the two input parameters ts_col and val_col of the anomaly_detection plugin, or directly proceed to the third step. Regardless, TaskWeaver must first retrieve the data and understand its schema before making a decision for the second step, which involves a self-reflection process.

As shown in this example, TaskWeaver incorporates a two-layer planning process during the handling of user requests. The first layer consists of the Planner generating a high-level plan outlining the steps required to fulfill the request. Subsequently, in each round, the code generator must devise a plan, in terms of chain-of-thought and generated code, to execute the specified step. This is how TaskWeaver works to handle complex tasks and adapt to domain-specific scenarios.

How to Access and Use this Model?

TaskWeaver is open-sourced and can be accessed from its GitHub repository. It offers an open-box experience, allowing users to run it immediately after installation. More documentations are available on the TaskWeaver website.

If you are interested to learn more about this model, all relevant links are provided at the end of this article under 'Source' section.

Limitations 

While TaskWeaver is a powerful and versatile framework, it does have certain limitations:

  • Handling domain-specific data analytics tasks: Existing Large Language Models (LLMs), including TaskWeaver, face challenges when dealing with domain-specific data analytics tasks that involve rich data structures. This could potentially limit the scope of tasks that TaskWeaver can handle effectively.
  • Flexibility: TaskWeaver, like other LLMs, struggles with flexibility to meet diverse user requirements. This means that while TaskWeaver can handle a wide range of tasks, it might not be able to cater to every specific need or requirement of the user.

These limitations highlight areas for potential improvement and further development in TaskWeaver and similar frameworks.

Conclusion

TaskWeaver is a powerful and flexible framework for creating intelligent conversational agents that can handle complex tasks and adapt to domain-specific scenarios. Despite its limitations, it offers a promising approach to building LLM-powered autonomous agents.


Source
research paper - https://arxiv.org/abs/2311.17541
research document - https://arxiv.org/abs/2311.17541.pdf
GitHub Repo - https://github.com/microsoft/TaskWeaver/
Website - https://microsoft.github.io/TaskWeaver/

No comments:

Post a Comment

ShowUI: Advanced Open-Source Vision-Language-Action Model for GUI

Introduction Graphical User Interface (GUI) assistants assist users to interact with digital appliances and applications. They can be an ord...