Pages

Friday, 13 September 2024

xLAM: Enhancing AI Agents with Salesforce’s Large Action Models

Presentational View

Introduction

An AI agent refers to an entity that transacts with and independently observes its surroundings while making decisions on how to act in a given context to accomplish certain tasks. They are intended to do work on their own with minimal interference from human beings, thanks to complex formulas. There has been a great improvement in AI agents in the recent past, for instance, they can perform complicated tasks like the processing of natural language, decision making and real time problem solving. These are attributed to utilization of the large language models (LLMs), and reinforcement learning mechanisms. However, the use of AI agents in autonomous systems encounter problems such as environment generalization, decision stability, inter-operability among others. The xLAM models attempt to solve these problems by enabling better function calling and increasing the robustness of the learning models in different settings.

Who Developed xLAM?

The xLAM models are the creation of Salesforce AI Research – one of the world’s foremost organizations for AI research. The very idea of making xLAMs was to design a set of models complementing the functioning of actions by AI agents. Salesforce AI Research was focused to enhance the integration of AI in different operational systems and models, which in turn, had a better efficiency rate.

What is xLAM?

Namely, xLAM means ‘Large Action Models’. These models are meant to improve decision making and map user’s intent into actionable steps while operating in the world.  Unlike traditional LLMs, xLAM focuses on function-calling and real-time task execution. It is best used for function calling and AI agents and several variations are proposed depending on the target application domain.

Model Variants

The xLAM family consists of a number of versions intended to be deployed in specific areas of application. The xLAM-1B model is relatively small and lightweight, making the module favoured for mobile use. Specifically, the xLAM-7B device is intended for academic usage with a limited measure of GPU allowance. The xLAM-8x7B is an 8x7B mixture-of-experts model , suitable for industrial processes, with reasonable latency, resource use, and fine performance. The xLAM-8x22B is a high combination OF model which has a large number of parts for high performance tasks.

Overview of xLAM model series
source - https://arxiv.org/pdf/2409.03215

Key Features of xLAM

  • Optimized for Function-Calling: As described above, xLAM models are specifically developed for function call operations which empower an xLAM model to act.
  • Scalable Training Pipeline: The models are trained using a scalable pipeline that apply and expand data unification across different domains in order to improve the models’ generality.
  • Real-Time Task Execution: xLAM models are designed to focus on real-time task processing where such tasks as update of the CRM system, answering customer’s questions and changing the sales pipeline can be done without any involvement of the person.
  • Enhanced Decision-Making: These models enhance choice making by reflecting the user’s goals into wanted conduct within the context of the global environment.
  • Competitive Performance: In xLAM models, we can observe that such approach provides equal or higher performance than the other agent benchmarks including the Function-Calling Leaderboard of Berkeley.
    An overview of xLAM model performances on the Berkeley Function Calling Leaderboard v2 (cutoff date 09/03/2024).
    source - https://arxiv.org/pdf/2409.03215

Capabilities/Use Case of xLAM

  • Autonomous Task Handling: xLAM models can independently perform the many-layered activities that include the initiation of processes within other software systems.
  • Customer Support Automation: It is possible to employ these models for successfully answering clients’ questions on various aspects of support. Example: Somewhere in Customer Service workflows where basic customer questions and inquiries are dealt with through a response by an automated manner but complex questions are forwarded to the human agents.
  • Sales Pipeline Management: xLAM models can cater for sales pipeline procedures to allow easy tracking and follow up of the sales leads. Example: By tracking leads and following up on the leads through emails, updating of sales records in a real-time manner streamlining the sales operation.
  • Generalizability Across Environments: The models are expected to operate effectively in varied contexts hence suitable for dynamic use cases. Example: Flexibility in the environment to fit various business processes and operation requiring a close integration with the existing structures.

How xLAM Models Work

The xLAM models work in a proper sequence which includes data preparation. This means to merge, verify, and enrich data patterns, so as to develop a solid and diverse dataset. This step is very important so that the model be able to perform better in nearly all the tasks required. After the data is prepared, the model is trained by the means of supervised fine-tuning as well as Direct Preference Optimization. This training can accommodate both small and big models hence it can be used in small training scenarios as well as large scale training.

Overview of the data processing, training and evaluation of xLAM.
source - https://arxiv.org/pdf/2409.03215

These models function on the cultivating data from various settings and normalizing the same. This leads to the creation of a generic data loader which is specifically optimized for the training process. The data processing steps include the steps such as data unification, data augmentation, and data quality check. These are task description and pre-specified assets, examples, questions, and actions in single format of a standardized task description. The use of this format facilitate the application of different enhancement techniques. The models are also designed as a mixture-of-experts in order to construct an efficient internal organization with a proper share of performance and resource demands.

After training, the models are evaluated based on several standards such as Webshop, ToolQuery, ToolBench, and the Berkeley Function-Calling Benchmark. This process also contains a feedback loop in which tips derived from these tests aid a constant enhancement of data quality. This makes sure that the models become better and better and thus enhance better performance in various tasks. By training the models with a lot of function/API calling data from open environments and in house simulators, the models acquire better strategies in accomplishing advanced tasks and therefore a crucial tool in improving AI.

Performance Evaluation

It was observed that the performance of the xLAM models rose up to the occasions during the various benchmarks as presented below. In the given Webshop environment (as shown in below table), organization of xLAM-7b-r produced the highest success rate of 41.4% while fending off other general models that are currently in circulation in the market such as GPT-4, Claude2 among others. As for the ToolQuery test case, we see that xLAM-8x7b-r and xLAM-8x22b-r rank second with the score of 68 % success. 3 % higher than even the much bigger Mixtral-8x22B-Instruct model.

Testing results on Webshop and ToolQuery
source - https://arxiv.org/pdf/2409.03215

In the Berkeley Function-Calling Benchmark v2, concerning which is a much popular benchmark, four xLAM models were positioned in the top twenty. The best model was xLAM-8x22b-r model which had an overall precision of 87.31 % and that is a record for a 31 % accuracy which is better than both GPT-4 and Claude-3.Even the smallest model, xLAM-1b-fc-r, ranked 32nd with a 75.43% success rate, outperforming larger models like Claude-3-Opus and GPT-3.5-Turbo.

Performance comparison on BFCL-v2 leaderboard
source - https://arxiv.org/pdf/2409.03215

These models also performed well in other assessments.  On the ToolBench benchmark, they outperformed TooLlama-V2 and GPT-3.5-Turbo-0125 in all categories and test conditions, and even surpassed GPT-4-0125-preview in some cases. Further, an ablation study on the data augmentation and cleaning processes employed in the xLAM pipeline also produced substantial gains in the different statistics analyzed.

How to Access and Use xLAM

The related xLAM models are on GitHub and Hugging Face. The local use is possible, or they can be introduced into an existing system through API. Additional information on how to set up and use these configurations are included in their respective repositories. They are open-source models, people should use it for research and academies , and therefore can be widely used within open community. Lastly, for those who wants to learn more about this AI model, all of the links are given at the end of this article.

Limitations and Future Work

The xLAM series never incorporate hypothetical scenarios, which may be reflected in the initial data used in most of the studies. It is important to note that although the presented data synthesis framework would be useful in the outlined contexts, it may not encompass all the possible applications. Furthermore, the models are relatively good at out of domain tasks and unseen tools but there is still significant scope of improvement in generalization and adaptability.

The future work could be geared towards the development of the more sophisticated data synthesis concepts and the inclusion of multimodal inputs.  It could also be useful to apply xLAM models in other, more complicated, or variable situations .  Building on the Mixtral Instruct models, future research could develop specialized models for specific tasks, leveraging xLAM’s flexible architecture.

Conclusion

xLAM models make decision-making processes more effective and can also facilitate the execution of some challenging operations, which means that its implementation can be beneficial in many spheres. Hence, through using such architectures to solve present problems, they enable more proficient and optimized AI operations. Due to their openness and competitive performance, such frameworks are inestimable for researchers and developers. 

Source
Salesforce blog: https://blog.salesforceairesearch.com/large-action-model-ai-agent/
Research paper: https://arxiv.org/abs/2409.03215
Research document: https://arxiv.org/pdf/2409.03215
Hugging face model collections: https://huggingface.co/collections/Salesforce/xlam-models-65f00e2a0a63bbcd1c2dade4
GitHub Repo: https://github.com/SalesforceAIResearch/xLAM


Disclaimer 
- This article is intended purely for informational purposes. It is not sponsored or endorsed by any company or organization, nor does it serve as an advertisement or promotion for any product or service. All information presented is based on publicly available resources and is subject to change. Readers are encouraged to conduct their own research and due diligence.

No comments:

Post a Comment

ShowUI: Advanced Open-Source Vision-Language-Action Model for GUI

Introduction Graphical User Interface (GUI) assistants assist users to interact with digital appliances and applications. They can be an ord...