Pages

Friday 20 September 2024

C4AI Command R+: Multilingual AI with Advanced RAG and Tool Use

Presentational View

Introduction

Retrieval-Augmented Generation (RAG) offers the perfect blend of retrieval and generation models to provide rich contextually-generated responses based on other sources of information. This approach is gradually improving by offering more accurate and related data retrieval and using that for creating better and more precise outputs. These enhancements are evident in C4AI Command R+ which is designed to produce responses literally based on document excerpts provided with citations which refer to the specific source of the information. Multi-step tool use on the other hand enables the models to perform a number of operations where the result of one step is utilized in the subsequent steps, which enables models to handle more complex tasks and processes in real-world cases and thus increasing versatility. C4AI Command R+ shines in this aspect as this model has been developed to plan and perform sequences of actions with various tools including a simple agent.

Who Developed the C4AI Command R+? 

The C4AI Command R+ model was designed by Cohere which is a start-up company that focuses on large language models in the business domain. The model was developed with inputs from Cohere's team of specialists. Cohere's aim is to provide language AI technology to the developers and large-scale enterprises to create new-age products and gain commercial benefits. ; Cohere For AI, a newly established research-oriented section of the company, also played a significant role in this model's creation. 

What is C4AI Command R+? 

C4AI Command R+ is Cohere's recent large language model designed for conversational engagement and continuous context involvement. It is most effective in the mixed RAG and multiple tool execution scenarios making it a perfect fit for high tier applications. We can even refer it as 'Command R+'.

Model Variants 

C4AI Command R+ comes in a few key variants. 'command-r-plus-08-2024', released in August 2024, is an updated version of the original 'command-r-plus' with enhanced tool use decision-making, instruction following, structured data analysis, robustness to non-semantic prompt changes, ability to decline unanswerable questions, and execute RAG workflows without citations, along with significant improvements in throughput and latency. It serves as an alias for 'command-r-plus-04-2024'. The differences lie in their performance optimisations, feature updates, and release timelines, with 'command-r-plus-08-2024' representing the most recent model so far.

Model Variants - Command R+
source - https://docs.cohere.com/docs/command-r-plus

Key Features of C4AI Command R+

  • Longer Context Length: Conveys up to 128k tokens as it facilitates the computation of greatly comprehensive and sequentially interdependent interactions.
  • Multilingual Capabilities: It is trained on 23 languages and it is evaluated in 10 languages. Besides, the overall usability is Optimized for multiple languages that are English, French, Spanish, Italian, German, Brazilian Portuguese, Japanese, Korean, Arabic and Simplified Chinese.
  • Cross-Lingual Tasks: It can do things that include but are not limited to translation and question and answer sessions in various languages.
  • Retrieval Augmented Generation (RAG): Creates replies with references in accordance with a context.
  • Multi-Step Tool Use: Interfaces with other external machinery in order to complete a number of different tasks.
  • Massive Parameter Size: This model has 104 billion parameters which can help perform complex tasks with high accuracy.

Capabilities and Unique Use Cases

C4AI Command R+ performs exceptionally in understanding at depth, paraphrasing and summarizing as well as in providing accurate answers to questions, and specific purpose based on its utilization in businesses, industries and corporate such as in customer relations, content generation, and data analysis wherein new execution may lead to higher productivity automation. Here are few use cases:

  • Multilingual Legal Contract Analysis: Provides an overview and highlights of legal contracts in different languages for global businesses as well as provides an identification of the potential risks of the clauses included in these contracts.
  • Cross-Lingual Literary Translation: Translates texts from one language into another, maintaining the original style and purpose of the work, as well as identifies themes and manners of composition in literature.
  • Real-Time Meeting Summarization: Real-time simultaneous interpretation into preferred languages, taking notes, providing summaries, and actionable follow-up points helpful to the international teams for meetings.
  • Generative Code Documentation: Supports various languages code and generate document in that particular language along with information about code, its summary and tutorial to help development teams that work on multiple languages.
  • Cross-Cultural Marketing Campaigns: Using cultural data and language differences to develop marketing-related commercials and other related products to assist global organizations growth.

Architecture and Advancements of C4AI Command R+

C4AI Command R+ is designed on an optimized transformer model that is a type of deep learning that was intended for use in the processing of sequential data such as text. This architecture is designed for convergence so that the best results can be achieved and this is very much useful where there is a need for high order arithmetic such as language modelling and text generation. In terms of text generation, this model follows an autoregressive model where the next word is predicted based on the previously generated words. This way the generated text is meaningful and stays in context because each word depends on the context as given by the preceding words.

Some of the developments include the supervised fine-tuning and preference training. These characteristics are achieved during supervised fine-tuning and preference training to match human-perceived helpfulness and safety behavior. This entails feeding the model with huge volumes of text and code data, and then tweaking its responses in line with, feedback from people. This process also help to make sure that C4AI Command R+ gives out responses which are correct, secure and conforming to human ethic.

Basic steps involved in multi-step tool use
source - https://docs.cohere.com/docs/multi-step-tool-use

There is also an improvement of multi-step tool use in the C4AI Command R+ version. Unlike single-step tool use where a model is allowed to request for external tools just but once, multi-step tool use allows the model to strategize on the flow of actions to complete his or her program using as many tools as possible. This capability greatly extends the applicability of the model and opens up a vast number of opportunities for its usage which now can cover complex real-world tasks that often involve decision making, multi-step computation, and if necessary, interaction with external systems. Furthermore, C4AI Command R+ integrates the Accelerated Query Attention (AQ+), an extension of the original GQA which improves the speed of the attention mechanism within the transformer model while keeping the model’s ability to form responses and process and analyze information at the same rate as the original model.

Performance Evaluation

Command R+ perform well in RAG tasks that contrast with other models in the scalable market category. In the comparison against a number of benchmark models, including the GPT-4 and Claude 3 Sonnet, as well as in the samples of head to head human preference evaluation of writing tasks, Command R+ scored higher in such aspects as text coherence and non-redundancy of citations. This evaluation employed a custom test set of 250 diverse documents with ornate summarization requisitions. Command R+ leveraged the RAG-API, while the baselines were proved to be heavily prompt-engineered, which demonstrated the usefulness of Command R+ in real-world commercial scenarios.

Human head-to-head preference results using a holistic grading scheme combining text fluency, citation quality, and overall utility
source - https://cohere.com/blog/command-r-plus-microsoft-azure

Another major experiment involved Tool Use features where it was necessary to automate business processes. Microsoft’s ToolTalk (Hard) benchmark was employed to assess Command R+ as well as Berkeley’s Function Calling Leaderboard (BFCL). Single-turn function calling as well as conversational tool usage were also excellent as depicted by the model. In the ToolTalk benchmark, high success rates were achieved in the recall of tool calls and in the prevention of unwanted actions in the Command R+ test In the BFCL evaluation, acceptable function success rates were achieved in Command R+ across the various subcategories of executables.


source - https://cohere.com/blog/command-r-plus-microsoft-azure

Other assessments included a multilingual support aspect, as well as tokenization effectiveness. In both FLoRES and WMT23 tests, Command R+ performed very well in translation tasks in 10 selected business languages. The model’s tokenizer was also highly efficient in data mincing of non-English text with the level of cost reduction reaching to 57% compared to others in the market. This efficiency was observed particularly so in non Latin script languages where the Cohere tokenizer generated fewer tokens compared to the string representation of the same text.

How to Access and Use C4AI Command R+

C4AI Command R+ provides options in deciding how one can have easy access and use the software. It is provided on Hugging Face where one can try it in a web-based environment or pull the model weights to run locally. The model can also be utilized by the Cohere API where the setting and the usage instructions are highlighted on the Hugging Face model card. C4AI Command R+ is a free software and released under the CC-BY-NC, permitting non-commercial use with proper attribution.

Limitations And Future Work

  • Language Support Limitation: Some of the more complex functionalities such as Retrieval-Augmented Generation (RAG) and multi-step tool use are currently only supported in English.
  • Context Window Issues: Writing prompts between 112 k and 128 k tokens lead to poor quality of the generated work due to performance issues on longer input.

Future Focus Areas are Increasing language coverage for additional features and addressing the issues of the context window to further increase applicability of the model for a non English speaking audience.

Conclusion

Among all C4AI commands, C4AI Command R+ is the unique solution that can help businesses that focus on the implementation of AI in complex processes. The ability to handle RAG, multi-step tool use, and multilingual further makes it a tool of great value in managing extended-context tasks and workflow interactions. This not only promotes productivity but also expands several doors of opportunities of enterprises’ applications.


Source
Website: https://docs.cohere.com/docs/command-r-plus
Hugging Face C4AI Command R+ weights: https://huggingface.co/CohereForAI/c4ai-command-r-plus-08-2024
Hugging Face Predecessors: https://huggingface.co/CohereForAI/c4ai-command-r-08-2024
Performance Evaluations : https://cohere.com/blog/command-r-plus-microsoft-azure


Disclaimer - This article is intended purely for informational purposes. It is not sponsored or endorsed by any company or organization, nor does it serve as an advertisement or promotion for any product or service. All information presented is based on publicly available resources and is subject to change. Readers are encouraged to conduct their own research and due diligence.

Tuesday 17 September 2024

Reader-LM: Efficient HTML to Markdown Conversion with AI

Presentational View

Introduction

Markdown is a language that is used for formatting content. Users able to format text using plain text  which later shall be converted to HTML format. A well formatted use of Markdown files is important in order to ensure that the files are easy to read and well organized. It makes the handling of content much easier especially where it is being shared across different groups and teams or when the same content is required to be posted on different social media platforms. There are now several ways of converting HTML to Markdown including HTML2Markdown, Turndown, and even online tools.

Some of the main issues are complex HTML structure, problem in format preservation and noise in HTML. Reader-LM has been developed to flux these problems by applying AI to enhance and full auto the conversion. This means that through AI, enhancements have been made to be able to create models such as Reader-LM, which can easily convert HTML to Markdown as it comprehends and parses the content better.

Who Developed Reader-LM?

Reader-LM is built by Jina AI — the company whose mission is to democratize Artificial Intelligence and make them open for everyone through Open-Source and Open-Science. The model was based on Jina Reader and contributed by different AI researcher and developers. The goal for Reader-LM was to build a fast and cheap tool that takes such raw, noisy HTML and converts it into clean Markdown. The primary purpose of this model is to make the process of converting the content simpler and at the same time enhancing the quality of the converted content.

What is Reader-LM?

Reader-LM is a suite of small language models for converting HTMLs into Markdowns. These models are developed to recognize the structure of HTML tables and generate neat and well-formatted Markdowns.

Model Variants

  • Reader-LM 0.5B: A new release of better optimized, less powerful version intended for simple tasks.
  • Reader-LM 1.5B: A version with larger size that allows for additional features focused to parse more complicated structure of HTML tags.

This means that these variants are tailored to suit the different needs of users, 0.5B model has efficiency at the center. while 1.5B model is more powerful and have higher processing capabilities than the other one.

Reader-LM models' specifications
source - https://jina.ai/news/reader-lm-small-language-models-for-cleaning-and-converting-html-to-markdown/

Key Features of Reader-LM

  • Multilingual Support: It has provision for multilingual support and this makes it ideal for use in different countries.
  • Long-Context Handling: Effective in handling long documents of up to 256K tokens of context length ; particularly HTML documents.
  • Efficient Performance: Originally intended for optimization on edge devices with less than 1 billion parameters.
  • Selective Copying: Concentrate on the transfer of selected HTML content to Markdown without losing much of the information.

Capabilities/Use Cases of Reader-LM

  • Content Conversion: Translates raw HTML of web pages and cleans it to Markdown format for documentation and content management.
  • Data Cleaning: Eliminates certain unwanted components such as headers, footers, and sidebars giving out a cleaner input.
  • Real-World Examples: Other than documentation, blogging, and content management system where clean Markdown is desirable, Reader-LM also has other real time utilization. For instance, it can be applied to build clean feed readers by parsing the raw HTML from various sources and translating them to structured Markdown which are easier to summarise and to identify topics. Due to its information extraction and structuring features, it can be applied in enhancing the quality of web for the visually impaired, developing individualized feeds and constructing content feeds, and extracting data for market research.

How Reader-LM Works

This is unlike most other reader-LM that uses a specific method to transform raw HTML to clean Markdown. Thus, instead of conventional approaches such as headless Chrome, Readability, regex, and Turndown library, Reader-LM makes use of a small language model (SLM) in this regard. This SLM is especially designed to learn how to work with the data input in the HTML format and output the Markdown format without the need for extensive use of rules that define the conversions. The following figure graphically illustrates this transition from a complex linear model that incorporates several stages, to the efficient model of SLM.

Illustration of reader-lm, replacing the pipeline of readability+turndown+regex heuristics using a small language model.
source - https://jina.ai/news/reader-lm-small-language-models-for-cleaning-and-converting-html-to-markdown/

Architecture/Design and Workflow

This SLM has been the key to Reader-LM’s architectural design for dealing with the challenges of converting HTML to Markdown. The HTML to markdown translator is trained on a huge training corpus of HTML and Markdown samples which helped the model learn the full features of HTML, Markdown and their interactions. Whenever a new HTML input is passed to Reader-LM, it moves from left to right and computes the most likely Markdown tokens according to the training set as well as the input HTML. This way, Reader-LM is able to retain the layout and content of the HTML whilst providing the reader with clean, properly formatted Markdown.

Uniqueness in Training Strategy

The training strategy adopted for Reader-LM is very important for it to be effective. This model in particular goes through a two-stage training process, namely on the ‘short-and-simple’ HTML as well as on the ‘long-and-hard’ HTML. It also helps the model to first learn basic concepts of HTML to Markdown then slowly it is trained with real world and long HTML documents. Further, the developers have used some strategies towards the difficulties in the degeneration and the training when the inputs are long such as contrastive search, repetition stop criteria and chunk-wise model forwarding. Combined with the selective copying and long-context policies, these strategies make for a high efficacy of Reader’s LM to convert HTML to Markdown.

Performance Evaluation of Reader-LM

To assess the performance of Reader-LM, the developers benchmarked it against Large Language Models such as GPT-4 and Gemini-1.5, measured by using the four metrics; Recycle Option for Ubiquitous Generation and Evaluation of reference summaries, TER and WER. The ROUGE-L evaluation computes the number of overlapping tokens which provides a measure of the model’s performance in capturing the content. TER, intended to assess hallucination, quantifies the rates of generated Markdown tokens which are unique to the generated output but were not present in the original HTML. WER which is often used in tasks such as OCR targets the word sequence and then gives a breakdown of insertions, deletions and substitution in a detailed manner in order to compare the output Markdown to the actual Markdown that is expected.

Quantitatively Evaluation
source - https://jina.ai/news/reader-lm-small-language-models-for-cleaning-and-converting-html-to-markdown/

Reader-LM, particularly the 1.5B model, offered very promising outcomes, with the highest score, 0.72 of ROUGE-L, as well as the lowest WER which was 1.87 and TER 0.19, which proves that the 1.5B model outperforms much larger ones in its aim to accurately translate HTML into Markdown with the lowest levels of errors and hallucinations can be considered.

Qualitative Study Results
source - https://jina.ai/news/reader-lm-small-language-models-for-cleaning-and-converting-html-to-markdown/

In addition, there was a qualitative analysis that received a visual analysis of Markdown-language outcoming from 22 HTML sources that represent diverse language and website types. This evaluation considered four key dimensions: The first four skills include header extraction, main content extraction, rich structure preservation, and Markdown syntax usage, all rated from 1 to 5. The study highlighted Reader-LM-1.5B achieves high awareness in structure preservation and Markdown standard syntax while comparing with  it's competitors .  It also always can not outperform the Jina Reader API , but it was comparable to bigger models, like Gemini 1.5 Pro.

How to access and Use Reader-LM

Reader-LM is now released to Hugging Face where it is possible to download the latest 0.5B and 1.5B parameter models. For reading the inputs locally using Reader-LM, transformers need to be installed and then the steps listed on the Hugging Face model page of the selected version have to be followed. For the followers, who would rather use an easily understandable approach, there is a link to the Colab notebook to play with the model. Reader-LM is open-source and licensed under the CC BY-NC 4. 0 license. One has to reach out to Jina AI for commercial access.

Limitations and Future Work

Reader-LM is proved to be effective in practice yet it can experience difficulties while dealing with highly nested html structures or the information which contains a lot of noise. Future research could focus on enhancing the capacity for handling of such cases of patient management. Also, it is multilingual to a certain extent, but there is a possibility for development in this direction.

Conclusion

Reader-LM is a considerable improvement in the process of converting HTML to Markdown in comparison with methods that primarily rely on simple pattern matching and heuristics. Hence, Reader-LM that leverages SLMs will offer a more efficient and arguably more accurate solution. By this advancement it becomes easier both in the usage of web content as well as the creation and management of the content hence bringing an improvement in the organization of the environment in the internet.


Source
Jina AI website: https://jina.ai/
reader lm post: https://jina.ai/news/reader-lm-small-language-models-for-cleaning-and-converting-html-to-markdown/
Hugging Face reader-lm 0.5b: https://huggingface.co/jinaai/reader-lm-0.5b
Hugging Face reader-lm 1.5b: https://huggingface.co/jinaai/reader-lm-1.5b
google Colab : https://colab.research.google.com/drive/1wXWyj5hOxEHY6WeHbOwEzYAC0WB1I5uA#scrollTo=lHBHjlwgQesA


Disclaimer - This article is intended purely for informational purposes. It is not sponsored or endorsed by any company or organization, nor does it serve as an advertisement or promotion for any product or service. All information presented is based on publicly available resources and is subject to change. Readers are encouraged to conduct their own research and due diligence.

Monday 16 September 2024

How Open Source Yi-Coder-9B-Chat Beats Larger Code Models

Presentational View

Introduction

Code models have greatly progressed especially with the use of large language models (LLMs) improving code generation, completion as well as debugging. These models have evolved from statistical to deep learning, more specifically that of the transformer model with remarkable results. These advancements can further be assisted by the small language models (SLMs) that are efficient, and can be scaled. They yield high performance with less computational expenses and therefore they are suitable for wider use.

However, there seem are challenges that still limit research in the field such as getting high quality data, tackling interpretability issues, or even computational complexity. Yi-Coder-9B-Chat solves these problems by using high-quality training data and adjusting model structure for improved effect. This is in line with the general developments made in other fields of AI with an intention of coming up with better models.

Yi-Coder-9B-Chat developed by '01. AI', a company, specializing in the development of the AI technologies. Some of the professionals that are involved in this particular model are, the ML/AI engineers, NLP, and s/w engineers. The purpose of designing Yi-Coder-9B-Chat was to design a strong but effective code language model that will be beneficial for developers in multiple coding chores in order to enhance the code quality without consuming much time.

What is Yi-Coder-9B-Chat?

Yi-Coder-9B-Chat is a novel open code language model which is optimized for generation, completion and debugging of codes. It belongs to Yi-Coder series and as the name suggests, there are variations within this series with respect to the parameters they come with.

Key Features of Yi-Coder-9B-Chat

  • Long-Context Understanding: It has a maximum context length of 128K tokens which makes it possible to accommodate large code bases.
  • Multi-Language Support: Virtually supports 52 major programming languages such as Java, Python, Javascript, C++, etc.
  • High-Quality Training Data: Taught on 2.4T high quality tokens of code obtained from GitHub and CommonCrawl.
  • Efficient Inference: Enabling efficient inference and variable training optimum networks with both wide applicability and diverse architecture designs to modern AI services.
  • Parameter Size: 9 billion parameters which are optimized for performance and efficiency.

Capabilities/Use Cases of Yi-Coder-9B-Chat

  • Code Generation and Completion: Frequently excels at generating and optimizing the code fragments in the numerous programming languages.
  • Debugging and Translation: It’s highly competent in correcting code and in moving it from a language to the other.
  • Project-Level Comprehension: May write code at the project level, large complex software systems appropriate for the development of.
  • Code Modification: Good at logical reasoning exercises such as rectification, translation, switching of languages, and improvement of codes.
  • Real-World Examples: Illustrative in actual coding competitions in popular coding platforms such as leetCode, Atcoder etc.

Optimized Architecture and Training of Yi-Coder-9B-Chat

Yi-Coder-9B-Chat is developed from a transformer based model to support long-context comprehension and inference. The model is built based on the decoder-only Transformer, which is improved by the modification called as Grouped-Query Attention or GQA. Furthermore, in the post-attention layer, SwiGLU activation is applied and Rotary Position Embedding (RoPE) with adjusted base frequency for processing input up to the maximum of 200K tokens. These choices in architecture allow Yi-Coder-9B-Chat to work with large codes hence making it useful to developers.

Yi’s pretraining data cleaning pipeline.
source - https://arxiv.org/pdf/2403.04652

The training process of Yi-Coder-9B-Chat was pre-trained on a huge dataset of 2.4 trillion high-quality tokens, some of them obtained from GitHub repositories, and tokens derived from source code selected from CommonCrawl. As illustrated in the above figure which shows that the pretraining data are pretty clean, the heuristic-based rule filters, learned filters, and the cluster-based filters work in tandem to guarantee high-quality data. After pretraining the model is further trained on a selection of less than 10000 multi-turn instruction-response dialog examples which are selected for quality rather than quantity. This broad training approach helps Yi-Coder-9B-Chat to remain on par with other large models while at the same time keeping up the efficiency.

To improve its performance, there are various techniques which are incorporated in Yi-Coder-9B-Chat. It is also scalable for using 4-bit model quantization and 8-bit KV cache quantization. This also helps to reduce the memory usage of the applied algorithms. In dynamic batching the response time is faster. PagedAttention assists for effective memory management during inference. It also has a component named as the Responsible AI Safety Engine (RAISE). This makes sure that pretraining, alignment as well as deployment is done safely. It can therefore help solve problems that range from environmental problems to cyber security. These are the characteristics that make Yi-Coder-9B-Chat smart and fast. Therefore it is a responsible solution for various coding related tasks.

Performance Evaluation of Yi-Coder-9B-Chat

As shown in below figure, Yi-Coder-9B-Chat is found to perform well in LiveCodeBench. This platform assesses learners’ programming skills using live problems from LeetCode, AtCoder, and CodeForces. Consequently, to eliminate data contamination, they employed problems of January to September of 2024. Yi-Coder-9B-Chat had a 23.4% pass rate. They outperformed more extensive models such as DeepSeek-Coder-33B-Instruct which scored 22. 3% and CodeLLama-34B-Instruct which was at 13. This makes it the only model with less than 10B parameters to achieve more than 20% pass rate.

LiveCodeBench
source - https://01-ai.github.io/blog.html?post=en/2024-09-05-A-Small-but-Mighty-LLM-for-Code.md 

The results showed that Yi-Coder-9B-Chat performed overwhelmingly well in basic code generation and reasoning tests. Its performance on HumanEval, MBPP and CRUXEval-O (see below table) was impressive. It had an 85. 4% for pass rate of HumanEval and the 73. 8% on MBPP. It did so with other code LLMs. Also worth pointing out is the fact that a year prior to the presented work, Yi-Coder-9B – the first open-source code LLM – achieved more than 50% accuracy on CRUXEval-O. This would explain its good pass rates in different coding exercises.

Benchmark results on HumanEval, MBPP and CRUXEval-O
source - https://01-ai.github.io/blog.html?post=en/2024-09-05-A-Small-but-Mighty-LLM-for-Code.md 

Yi-Coder-9B-Chat also performed well on code editing (CodeEditorBench), code completions (CrossCodeEval) and long-context modeling with up to 128K tokens. In math reasoning it got seventy percent. 3% of program-assisted mathematical reasoning proficiency according to program-aided benchmarking. It outperformed larger models. The use of the model in achieving these results has demonstrated its viability in terms of versatility and effectiveness. That makes Yi-Coder-9B-Chat an efficient tool to facilitate software development. It is remarkable though it still remains well under 10 billion parameters.

Comparative Analysis with Competing Models

Some differences stand out between Yi-Coder-9B-Chat, DeepSeek-Coder-33B-Instruct, and CodeLLama-34B-Instruct. Yi-Coder-9B-Chat is based on an optimized transformer model with 9 billion parameters. It can handle a maximum context length of 128K tokens. It is trained on 2.4 trillion high-quality tokens from GitHub and CommonCrawl. This makes it very efficient when working with large code repositories and various programming languages.

DeepSeek-Coder-33B-Instruct, in contrast, uses a transformer architecture with 33 billion parameters. It is trained on 2 billion tokens of instruction data. It provides project-level code completion and infilling with a window size of 16K. The model is trained on a dataset containing 2 trillion tokens, which are 87% code and 13% natural language. This makes it very flexible and highly scalable, easily adapting to any task requested by the user. CodeLLama-34B-Instruct, part of the Code Llama family, is a general-purpose code synthesis and understanding tool. It focuses on code completion and infilling. It ranges from 7 billion to 34 billion parameters and is designed for code synthesis and understanding tasks.

Yi-Coder-9B-Chat is an attractive option for developers seeking efficient and effective coding solutions. It is especially suitable for tasks requiring long-context understanding and support for multiple programming languages. In contrast, DeepSeek-Coder-33B-Instruct excels in project-level code completion and infilling. CodeLLama-34B-Instruct provides general code synthesis and understanding capabilities. Each model has its strengths, making them suitable for different use cases and scenarios.

How to Access and Use Yi-Coder-9B-Chat?

The code of Yi-Coder-9B-Chat can be accessed through the Hugging Face repository as well as the GitHub repository. Importantly, it is rather simple to run the model locally using transformers. In any case, it is possible to find it via online services. About the equipment, the two platforms offer elaborate instructions with regard to set up as well as general use. The model is free to use, and the developers have used Apache 2.0 license. Interested users can find all relevant links for this model at the end of this article.

Limitations and Future Work

Yi-Coder-9B-Chat is impressive. It may struggle with large and complex project or certain narrowly defined tasks. Some of them could be made to make it more flexible in the future. Perhaps, the formalization of its training data could help it handle more scenarios.

Conclusion

In general, Yi-Coder-9B-Chat is very useful in coding assignments. Thus, using this model in code development tasks offers its functionalities and performance as benefits to developers. It also comprehends long contexts and can accommodate multiple programming languages. This makes it a great asset to AI and coding and other technology-related communities.


Source
01.AI Blog: https://01-ai.github.io/blog.html?post=en/2024-09-05-A-Small-but-Mighty-LLM-for-Code.md 
Hugging Face Weights: https://huggingface.co/01-ai/Yi-Coder-9B-Chat
Research paper: https://arxiv.org/abs/2403.04652
research document: https://arxiv.org/pdf/2403.04652
Base model: https://github.com/01-ai/Yi-Coder


Disclaimer - This article is intended purely for informational purposes. It is not sponsored or endorsed by any company or organization, nor does it serve as an advertisement or promotion for any product or service. All information presented is based on publicly available resources and is subject to change. Readers are encouraged to conduct their own research and due diligence.

Friday 13 September 2024

xLAM: Enhancing AI Agents with Salesforce’s Large Action Models

Presentational View

Introduction

An AI agent refers to an entity that transacts with and independently observes its surroundings while making decisions on how to act in a given context to accomplish certain tasks. They are intended to do work on their own with minimal interference from human beings, thanks to complex formulas. There has been a great improvement in AI agents in the recent past, for instance, they can perform complicated tasks like the processing of natural language, decision making and real time problem solving. These are attributed to utilization of the large language models (LLMs), and reinforcement learning mechanisms. However, the use of AI agents in autonomous systems encounter problems such as environment generalization, decision stability, inter-operability among others. The xLAM models attempt to solve these problems by enabling better function calling and increasing the robustness of the learning models in different settings.

Who Developed xLAM?

The xLAM models are the creation of Salesforce AI Research – one of the world’s foremost organizations for AI research. The very idea of making xLAMs was to design a set of models complementing the functioning of actions by AI agents. Salesforce AI Research was focused to enhance the integration of AI in different operational systems and models, which in turn, had a better efficiency rate.

What is xLAM?

Namely, xLAM means ‘Large Action Models’. These models are meant to improve decision making and map user’s intent into actionable steps while operating in the world.  Unlike traditional LLMs, xLAM focuses on function-calling and real-time task execution. It is best used for function calling and AI agents and several variations are proposed depending on the target application domain.

Model Variants

The xLAM family consists of a number of versions intended to be deployed in specific areas of application. The xLAM-1B model is relatively small and lightweight, making the module favoured for mobile use. Specifically, the xLAM-7B device is intended for academic usage with a limited measure of GPU allowance. The xLAM-8x7B is an 8x7B mixture-of-experts model , suitable for industrial processes, with reasonable latency, resource use, and fine performance. The xLAM-8x22B is a high combination OF model which has a large number of parts for high performance tasks.

Overview of xLAM model series
source - https://arxiv.org/pdf/2409.03215

Key Features of xLAM

  • Optimized for Function-Calling: As described above, xLAM models are specifically developed for function call operations which empower an xLAM model to act.
  • Scalable Training Pipeline: The models are trained using a scalable pipeline that apply and expand data unification across different domains in order to improve the models’ generality.
  • Real-Time Task Execution: xLAM models are designed to focus on real-time task processing where such tasks as update of the CRM system, answering customer’s questions and changing the sales pipeline can be done without any involvement of the person.
  • Enhanced Decision-Making: These models enhance choice making by reflecting the user’s goals into wanted conduct within the context of the global environment.
  • Competitive Performance: In xLAM models, we can observe that such approach provides equal or higher performance than the other agent benchmarks including the Function-Calling Leaderboard of Berkeley.
    An overview of xLAM model performances on the Berkeley Function Calling Leaderboard v2 (cutoff date 09/03/2024).
    source - https://arxiv.org/pdf/2409.03215

Capabilities/Use Case of xLAM

  • Autonomous Task Handling: xLAM models can independently perform the many-layered activities that include the initiation of processes within other software systems.
  • Customer Support Automation: It is possible to employ these models for successfully answering clients’ questions on various aspects of support. Example: Somewhere in Customer Service workflows where basic customer questions and inquiries are dealt with through a response by an automated manner but complex questions are forwarded to the human agents.
  • Sales Pipeline Management: xLAM models can cater for sales pipeline procedures to allow easy tracking and follow up of the sales leads. Example: By tracking leads and following up on the leads through emails, updating of sales records in a real-time manner streamlining the sales operation.
  • Generalizability Across Environments: The models are expected to operate effectively in varied contexts hence suitable for dynamic use cases. Example: Flexibility in the environment to fit various business processes and operation requiring a close integration with the existing structures.

How xLAM Models Work

The xLAM models work in a proper sequence which includes data preparation. This means to merge, verify, and enrich data patterns, so as to develop a solid and diverse dataset. This step is very important so that the model be able to perform better in nearly all the tasks required. After the data is prepared, the model is trained by the means of supervised fine-tuning as well as Direct Preference Optimization. This training can accommodate both small and big models hence it can be used in small training scenarios as well as large scale training.

Overview of the data processing, training and evaluation of xLAM.
source - https://arxiv.org/pdf/2409.03215

These models function on the cultivating data from various settings and normalizing the same. This leads to the creation of a generic data loader which is specifically optimized for the training process. The data processing steps include the steps such as data unification, data augmentation, and data quality check. These are task description and pre-specified assets, examples, questions, and actions in single format of a standardized task description. The use of this format facilitate the application of different enhancement techniques. The models are also designed as a mixture-of-experts in order to construct an efficient internal organization with a proper share of performance and resource demands.

After training, the models are evaluated based on several standards such as Webshop, ToolQuery, ToolBench, and the Berkeley Function-Calling Benchmark. This process also contains a feedback loop in which tips derived from these tests aid a constant enhancement of data quality. This makes sure that the models become better and better and thus enhance better performance in various tasks. By training the models with a lot of function/API calling data from open environments and in house simulators, the models acquire better strategies in accomplishing advanced tasks and therefore a crucial tool in improving AI.

Performance Evaluation

It was observed that the performance of the xLAM models rose up to the occasions during the various benchmarks as presented below. In the given Webshop environment (as shown in below table), organization of xLAM-7b-r produced the highest success rate of 41.4% while fending off other general models that are currently in circulation in the market such as GPT-4, Claude2 among others. As for the ToolQuery test case, we see that xLAM-8x7b-r and xLAM-8x22b-r rank second with the score of 68 % success. 3 % higher than even the much bigger Mixtral-8x22B-Instruct model.

Testing results on Webshop and ToolQuery
source - https://arxiv.org/pdf/2409.03215

In the Berkeley Function-Calling Benchmark v2, concerning which is a much popular benchmark, four xLAM models were positioned in the top twenty. The best model was xLAM-8x22b-r model which had an overall precision of 87.31 % and that is a record for a 31 % accuracy which is better than both GPT-4 and Claude-3.Even the smallest model, xLAM-1b-fc-r, ranked 32nd with a 75.43% success rate, outperforming larger models like Claude-3-Opus and GPT-3.5-Turbo.

Performance comparison on BFCL-v2 leaderboard
source - https://arxiv.org/pdf/2409.03215

These models also performed well in other assessments.  On the ToolBench benchmark, they outperformed TooLlama-V2 and GPT-3.5-Turbo-0125 in all categories and test conditions, and even surpassed GPT-4-0125-preview in some cases. Further, an ablation study on the data augmentation and cleaning processes employed in the xLAM pipeline also produced substantial gains in the different statistics analyzed.

How to Access and Use xLAM

The related xLAM models are on GitHub and Hugging Face. The local use is possible, or they can be introduced into an existing system through API. Additional information on how to set up and use these configurations are included in their respective repositories. They are open-source models, people should use it for research and academies , and therefore can be widely used within open community. Lastly, for those who wants to learn more about this AI model, all of the links are given at the end of this article.

Limitations and Future Work

The xLAM series never incorporate hypothetical scenarios, which may be reflected in the initial data used in most of the studies. It is important to note that although the presented data synthesis framework would be useful in the outlined contexts, it may not encompass all the possible applications. Furthermore, the models are relatively good at out of domain tasks and unseen tools but there is still significant scope of improvement in generalization and adaptability.

The future work could be geared towards the development of the more sophisticated data synthesis concepts and the inclusion of multimodal inputs.  It could also be useful to apply xLAM models in other, more complicated, or variable situations .  Building on the Mixtral Instruct models, future research could develop specialized models for specific tasks, leveraging xLAM’s flexible architecture.

Conclusion

xLAM models make decision-making processes more effective and can also facilitate the execution of some challenging operations, which means that its implementation can be beneficial in many spheres. Hence, through using such architectures to solve present problems, they enable more proficient and optimized AI operations. Due to their openness and competitive performance, such frameworks are inestimable for researchers and developers. 

Source
Salesforce blog: https://blog.salesforceairesearch.com/large-action-model-ai-agent/
Research paper: https://arxiv.org/abs/2409.03215
Research document: https://arxiv.org/pdf/2409.03215
Hugging face model collections: https://huggingface.co/collections/Salesforce/xlam-models-65f00e2a0a63bbcd1c2dade4
GitHub Repo: https://github.com/SalesforceAIResearch/xLAM


Disclaimer 
- This article is intended purely for informational purposes. It is not sponsored or endorsed by any company or organization, nor does it serve as an advertisement or promotion for any product or service. All information presented is based on publicly available resources and is subject to change. Readers are encouraged to conduct their own research and due diligence.

Wednesday 11 September 2024

MiniCPM3-4B: Open-Source Model with Superior Scalability

Presentational View

Introduction

Scalability in model and data dimensions has to do with a system’s capability to manage large datasets and models of enhanced complexity without recurring performative issues. This is particularly important in present-day artificial intelligence and machine learning applications where models have to crunch through large data sets and make extensive computations as well. The benefits include; increased efficiency in system operation, efficiency in the management of various resources, and capability to manage changing data. Recent improvement in scalability has improved the ability of AI models to analyze large data sets and solve more difficult problems than before. Even the MiniCPM3-4B contributes to this trend and takes scalability one step higher due to the availability of more features that are powerful and flexible.

Who developed this model?

MiniCPM3-4B was designed with contribution from OpenBMB, a prominent organization that researches and explores on Artificial Intelligence.  OpenBMB aims to create highly complex AI models that are easy to use and applicable across various fields. MiniCPM3-4B was developed in the effort to produce a single powerful and flexible version of the model capable of performing a wider range of tasks faster than its predecessors.

What is MiniCPM3-4B?

MiniCPM3-4B is the third generation of the series of the MiniCPM developed by the author as language model working with high efficiency and accuracy at different tasks. One of its most specifying features is that it has 4 billion parameters which makes it considerably more potent than previous versions. Thus it extends prior versions including MiniCPM1. 0 and MiniCPM2 and offers enhanced performance and versatility.

Key Features of MiniCPM3-4B

  • RAG Capability: It includes a Retrieval-Augmented Generation (RAG) suite, which enhances the model’s performance in open-domain tasks: question answering and cross-lingual retrieval tasks. This capability allows the model to search through vast databases and find the appropriate information which will in turn provide accurate responses.
  • Function Call Support: MiniCPM3-4B has some linking support for the functions so it is capable to do a particular job in a much better way.
  • Code Interpreter: The model has a code interpreter incorporated and therefore has a flexible working capacity especially when it comes to handling programming duties.
  • 32k Context Window: It has a 32k context window which makes it possible for it to work through larger context sequences of data.
  • LLMxMapReduce: This feature can in theory make the amount of memory needed by the model equal to zero while at the same time allowing for infinite context.

Capabilities/Use Cases of MiniCPM3-4B

  • Data AnalysisMiniCPM3-4B boasts of processing data as well as being able to recognize patterns from the complex data sets. Due to its theoretical nature, it is able to work with an array of context in real time while preserving its integrity.
  • Natural Language Processing (NLP): The model is very efficient in most of the NLP related tasks such as sentiment analysis, language translation and summarization. Its improved performance in the benchmarks such as MMLU and BBH is due to the better recognition and production of human language.
  • Code Generation and Debugging: MiniCPM3-4B as a CPM instruction set computer has included a code-interpreter which allows it to write and debug small code snippets that are pretty useful for software engineers and roboticists.
  • Customer Support Automation: The strengths of the model include the ability to reason and come up with responses that are natural to humans; As such, it is appropriate for use in responding to the customer inquiries, and offering the correct and relevant assistance.
  • Educational Tools: With the help of MiniCPM3-4B, it is possible to design application, which can teach human hands how to act in different situations. such  features allow to perform more extensive queries and receive detailed answers since it helps during the studying process.

MiniCPM3-4B is even better than its predecessors in those aspects in terms of bigger parameter size, more features and better benchmark.

Technological Advancements of MiniCPM3-4B

MiniCPM3-4B employs an efficient decoder-only transformer structure; the number of attention heads integrated into the design and the dimensions of feed-forward network layers are both well-selected for the best results. This architectural optimization results in MiniCPM3-4B having the ability to accomplish a lot despite having significantly fewer parameters at only 4 billion and is fairly small, and Very competitive performance with much larger models on multiple arenas.

MiniCPM3-4B: The training process is an important factor in the integrated approach and involve more effective techniques to increase the efficiency of the training process. Tools such as DeepSpeed and Megatron-LM allow for parallel distribution of the training processes to different GPUs and nodes, thus, achieving faster training and less demand in resources. The model probably uses dynamic loss scaling and gradient checkpointing in order to prevent the overflow of the numbers and decrease the memory consumption during the training. Moreover, in the training data acquisition step, intelligent filters and deduplication are applied in order not only to prevent the model from learning from poor quality, non-diverse, and uninformative text samples.

MiniCPM3-4B has a new tokenization procedure likely to be based on the Byte-Pair Encoding (BPE) improved for translation between multiple languages, especially Chinese and English. As for task-specific variants like MiniCPM-3B-Code, there are often methods for fine-tuning, which include LoRA (Low-Rank Adaptation) or prefix tuning, and so on, that can change the weights of models a few times with a minimal alteration in numbers to adapt to new tasks. Furthermore, the model conceivably resembles an inference model, that is, it may include additional elements such as the quantization-aware training and caching of attention.

Performance Evaluation with Other Models

Some of the models that MiniCPM3-4B has been compared to in benchmarking include the following models; GPT-3. 5-Turbo and Phi-3. 5-mini-Instruct. Performance evaluations, namely the Berkeley Function Calling Leaderboard (BFCL) and MathBench, are enhanced by it.

Comprehensive Evaluation
source - https://github.com/OpenBMB/MiniCPM/blob/main/README-en.md

In MathBench it exhibited better mathematical ability and proficiency than GPT-3. 5-Turbo and several 7B-9B models and some of its modifications with 6-cylinder engines of the same generation 7B-9B.


source - https://github.com/OpenBMB/MiniCPM/blob/main/README-en.md

The performance of MiniCPM3-4B in the BFCL surpassed that of SOTA models on the provided datasets for models with less than 9B parameters, leaving behind such models as GLM-4-9B-Chat and Qwen2-7B-Instruct.

Thus, it is quite competitive to many recently introduced 7B-9B models and stands still as a contender in the AI industry. For example, it can compete with such models as Llama3.1-8B-Instruct and Baichuan2-13B in a number of tasks ranging from open-domain QA and cross-lingual retrieval tasks. This proves again that MiniCPM3-4B is very effective and that it could be utilized in various fields of applications.

How to Access and Use This Model?

MiniCPM3-4B is available on different media cloud platforms such as Hugging Face and GitHub. Particular steps for local installation are provided at the GitHub repository so that other users can also implement the model on their own computers. Further, the users can also experience MiniCPM3-4B through an online demo provided by the developers. The model itself is developed and released with an Apache-2. Model can be utilised commercially but the specific licensing terms need to be adhered to.

Limitations and Future Work

Due to the limitation of the model size and parameters of 4 billion, MiniCPM3-4B might fail to learn finer patterns in languages; it may not be suitable for use in highly accurate tasks like Fact- Check, Sentiment Analysis, etc. Furthermore, its pre-training course with a less extensive volume of data prevents it from being versatile and perform well in tasks related to humor or sarcasm detection.

Future work intends to build upon these limitations by using even bigger models, and a more diverse datasets for pre-training to improve the models capability for a wider range of tasks. Unlike the previous model, it is also the intention of developers to discover ways of training the model using less energy in a bid to support future sustainability of the innovation.

Conclusion

MiniCPM3-4B has made a important advancement in AI model development The tool is scalable, has enhanced features in comparison with the previous version and can be used in various tasks. It thus places it in a special position to continue with its mission of driving the development of the AI technologies as well as helping in speeding up the process of processing as well as analyzing data.


Source
modelscope website: https://www.modelscope.cn/models/OpenBMB/MiniCPM3-4B
Hugging Face: https://huggingface.co/openbmb/MiniCPM3-4B
GitHub Repo: https://github.com/OpenBMB/MiniCPM/blob/main/README-en.md
research paper: https://arxiv.org/abs/2404.06395v3
research document: https://arxiv.org/pdf/2404.06395v3


Disclaimer - This article is intended purely for informational purposes. It does not constitute legal, financial, medical, or professional advice. It is not sponsored or endorsed by any company or organization, nor does it serve as an advertisement or promotion for any product or service. All information presented is based on publicly available resources and is subject to change. Readers are encouraged to conduct their own research and due diligence.

Aria: Leading the Way in Multimodal AI with Expert Integration

Introduction Multimodal Mixture-of-Experts models are the latest in wave AI. They take in multiple kinds of input into a single system-inclu...