Pages

Thursday, 3 August 2023

MetaGPT: A Framework for Multi-Agent Meta Programming

Natural Language for Multi-Agent Meta Programming - symbolic image

Introduction 

Are you interested in learning about a new model that can enable multi-agent collaboration and coordination using meta programming? In this blog article, I will explain a novel framework that leverages the power of generative pre-trained transformers (GPT) to create and execute meta programs for complex tasks involving multiple agents. MetaGPT is a model developed by the team from different universities across China and University of California, Berkeley in collaboration with Geekan, a company that specializes in developing artificial intelligence models. 

The purpose behind the development of this model was to create a multi-agent collaborative framework for meta programming. This model helps to address the challenges of multi-agent collaboration and coordination in complex tasks that require high-level reasoning and planning. The authors observed that existing methods for multi-agent systems either rely on hand-crafted rules or reinforcement learning algorithms that are hard to scale and generalize. They proposed to use meta programming as a new paradigm to enable flexible and efficient multi-agent collaboration and coordination. 

What is MetaGPT? 

MetaGPT is a meta programming model that allows for the creation of multi-agent collaborative frameworks. This means that multiple agents can work together to achieve a common goal, using the power of meta programming to adapt and learn from their environment.

Key Features of MetaGPT

Some of the key features of MetaGPT are:

  • MetaGPT can handle various types of multi-agent collaborative tasks, such as cooperative games, collective decision making, and task allocation.
  • MetaGPT can generate meta programs from natural language descriptions or demonstrations of the tasks, allowing for a more intuitive and user-friendly experience.
  • MetaGPT can execute meta programs using natural language as input and output, making it easier for users to interact with the model.
  • MetaGPT can learn from feedback and improve its meta programs over time, allowing for more accurate and effective results.
  • MetaGPT can generalize to new tasks and domains that were not seen during training, making it a versatile and powerful tool.

Capabilities/Use Case of MetaGPT

MetaGPT has many potential applications and use cases in various fields and scenarios that involve multi-agent collaboration and coordination. Some of the examples are:

  • Gaming: MetaGPT can be used to create and control intelligent agents that can cooperate or compete with human players or other agents in various games, such as board games, card games, video games, etc. MetaGPT can also be used to generate new games or game rules based on natural language descriptions or demonstrations.
  • Education: MetaGPT can be used to create and execute educational programs that can teach or tutor students on various subjects, such as math, science, language, etc. MetaGPT can also be used to generate and solve problems or exercises based on natural language descriptions or demonstrations.
  • Business: MetaGPT can be used to create and execute business programs that can optimize or automate various processes, such as scheduling, planning, budgeting, marketing, etc. MetaGPT can also be used to generate and analyze data or reports based on natural language descriptions or demonstrations.
  • Social: MetaGPT can be used to create and execute social programs that can facilitate or enhance various interactions, such as communication, negotiation, persuasion, collaboration, etc. MetaGPT can also be used to generate and evaluate opinions or arguments based on natural language descriptions or demonstrations.

How does MetaGPT work?

MetaGPT is a model that uses the power of natural language to create and execute meta programs for multi-agent collaboration. Meta programs are programs that can generate or modify other programs based on some input or context. MetaGPT has two main components: a meta program generator (MPG) and a meta program executor (MPE). The MPG and MPE work together in two layers: the Foundational Components Layer and the Collaboration Layer.

The Foundational Components Layer provides the basic elements for each agent to function and communicate. It includes the Environment, which allows agents to share workspaces and messages; the Memory, which stores and retrieves past messages; the Role, which defines the domain-specific skills and workflows of each agent; the Action, which performs modular subtasks; and the Tools, which offer common services and utilities.

The Collaboration Layer builds on top of the Foundational Components Layer to enable agents to work together on complex problems. It implements key mechanisms for cooperation, such as Knowledge Sharing and Encapsulating Workflows. Knowledge Sharing helps agents to exchange information effectively, contributing to a shared knowledge base. Encapsulating Workflows uses SOPs to break down complex tasks into smaller, manageable components. It assigns these subtasks to suitable agents and supervises their performance by standardized output.

MetaGPT also uses other techniques to enhance its performance. For instance, it uses prompts to encode Standardized Operating Procedures (SOPs) that guide structured coordination. It also requires modular outputs that give agents domain expertise similar to human professionals. This way, MetaGPT can use the assembly line work model to assign different roles to different agents and create a framework that can effectively and cohesively solve complex multi-agent collaborative problems.

Core components overview of MetaGPT.
source - https://arxiv.org/pdf/2308.00352v2.pdf

By dividing into two layers, MetaGPT achieves modularity and efficiency in both individual and collective agent capabilities. The components offer reusable building blocks and utilities while the collaboration modules integrate purposeful coordination. As shown in above Figure, the model works by allowing agents to collaborate and learn from their environment, using meta programming to adapt and improve their performance.

Performance evaluation with other Models

As shown in below figure, MetaGPT  outperforms other frameworks, such as AutoGPT and AgentVerse, in terms of their capabilities and performance. MetaGPT offers a more comprehensive and robust solution for multi-agent collaboration and coordination. The performance of the frameworks was tested on 7 different tasks. MetaGPT showed strong performance on a variety of tasks, achieving successful execution in all but two cases. 

Statical Analysis & comparision of MetaGPT with other models and tasks
source - https://arxiv.org/pdf/2308.00352v2.pdf

On the other hand, AutoGPT and AgentVerse failed to execute any of the tasks successfully. The cost analysis showed that each project used an average of 26626.86 tokens for prompts and 6218.00 tokens for task completion, resulting in a total cost of $1.09 for completing the tasks. The whole construction process took 517.71 seconds. In summary, MetaGPT is a superior framework that provides a more effective and efficient solution for project execution.

To fully understand the notations and information presented in the tables, it is recommended to read the original research paper. The paper provides detailed explanations and context for the data presented in the tables, allowing for a more comprehensive understanding of the information. 

How to access and use this model?

User can find the code and instructions for using it on GitHub. The model is open-source and you can use it for any purpose, commercial or non-commercial. The license information is also available on GitHub. To use MetaGPT, you need to clone the repository to your local machine and follow the installation instructions in the README file. User can choose to install it either the traditional way or using Docker.

If you are interested to learn more about MetaGPT framework, all relevant links are provided under the 'source' section at the end of this article.

Limitations and Future Work 

MetaGPT is a powerful framework However, it also has some limitations that need to be addressed in the future. Some of these limitations are:

  • It sometimes refers to non-existent resource files, such as images and audio, that are not available in the environment or the input. This can cause errors or confusion when executing the meta programs.
  • It sometimes invokes undefined or unimported classes or variables, especially when dealing with complex tasks that require multiple agents and tools. This can cause errors or inconsistencies when executing the meta programs.

These limitations are mainly due to the hallucinatory tendency of large language models, which can generate texts that are not grounded in reality or logic. This can be improved by using a more clear and efficient agent collaboration workflow that can verify and validate the meta programs before execution.

Conclusion

MetaGPT is a framework that revolutionizes the way multi-agents interact and cooperate, using natural language as a universal interface for meta programming. MetaGPT redefines the landscape of complex problem-solving and points a potential pathway towards Artificial General Intelligence.


Source
research paper - https://arxiv.org/abs/2308.00352v2
research document - https://arxiv.org/pdf/2308.00352v2.pdf
GitHub Repo - https://github.com/geekan/MetaGPT

No comments:

Post a Comment

ShowUI: Advanced Open-Source Vision-Language-Action Model for GUI

Introduction Graphical User Interface (GUI) assistants assist users to interact with digital appliances and applications. They can be an ord...