Pages

Tuesday 24 September 2024

Qwen2.5: Versatile, Multilingual, Open-Source LLM Series

Presentational View

Introduction

In the context of model scale in AI, people refer to adding more parameters and computation capability to the models so that they can predict more complicated processes and recognize different circumstances. This advancement is precipitated by the desire to undertake other uses of artificial intelligence and better performance. This paper noted that high quality data is important in training AI models because it will guarantee quality outputs. Dealing with large datasets is beneficial in the sense that there are many such examples that will assist the model yield very good results through feeding models a large, clean and realistic dataset.

The incorporation of external knowledge sources such as knowledge graphs within AI models increases the range and relevance of responses. This approach is advantageous over models where predictions only depend solely on the training data set. Progresses in the field of data gathering, types of model designs, and training algorithms are improving these domains.  But even AI models still have problems for example in data quality problems, imbalance and last but not the least; the vast required computations. Qwen2.5 aspires to overcome these using high quality data, large scale architectures and complex training techniques to become the new reference framework for AI progression.

Who Developed Qwen2.5?

The Qwen2.5 technology was produced by the Qwen team operating within Alibaba Cloud. This team includes experts in the fields of artificial intelligence and machine learning who have had critical participation in the development of the model throughout collection of data, design of algorithms, and optimization. Alibaba Cloud, an important player in the cloud computing sector, is committed to developing AI technology and creating innovative answers for various applications.

What is Qwen2.5?

Qwen2.5 is family of large language models. It is a series of dense, efficient, decoder-only language modeling architecture capable of performing a wide range of NLP tasks. They come in various sizes and can have as few as 0.5 billion parameters and as many as 72 billion, the differences pointing at their versatility.


source - https://qwenlm.github.io/blog/qwen2.5/

Model Variants

The Qwen2.5 models are organized into three main series to cater to different needs:

  • Qwen2.5 Series: This series is intended for various text generation tasks and contains models intended for them. It has its base models and instruct variants where instruct variation is built to have better characteristics in instruction following and dialogue.
  • Qwen2.5-Coder Series: These models are designed for coding exercises and were built from a large corpus of code. When applied they perform generation, completion, reasoning and repair of codes and are best suited for software development and related uses.
  • Qwen2.5-Math Series: These models are dedicated to mathematical tasks, and both Chinese and English languages are welcomed. They use sophisticated approaches including Chain-of-Thought and Tool-Integrated Reasoning to solve calculation challenges on a computer.

QWEN2.5 Large Language Models
source - https://qwenlm.github.io/blog/qwen2.5/

Every single series includes models of different complexity – from 0.5B to 72B parameters – to match the available computational power and requirements for specific tasks. Besides, there are specific kind of products such as Qwen-Plus and Qwen-Turbo can be ordered through API for certain usages.

Key Features of Qwen2.5

  • Long Text Generation: Qwen2.5 will create texts up to 8K tokens; because of which, it can be useful for creating large documents and provide specific information.
  • Structured Data Understanding: The novelty of the model lies in the ability to focus on comprehending structured data, such as tables, and improving the accuracy of the answers given in the context.
  • Multilingual Support: Qwen2.5 supports over 29 languages, such as Chinese, English, French, Spanish used in multilingual content creation and translation.
  • Enhanced Instruction Following: Qwen2.5 is very capable of handling executable directives and producing orderly results especially in JSON format.
  • Context Length Support: It is designed to process long context up to 128K tokens so that it ensures good coherence with the rest of the text.
  • Larger and Higher-Quality Pre-training Dataset: Drawing data from up to 18 trillion tokens, Qwen2.5 has had an enhancement in high-quality code, mathematics, and multilingual data to solve problems that range across various fields.

Use Cases of Qwen2.5

  • Enhanced Mobile Applications: Qwen2.5’s small-size implementations, including the 3B model, allow for creating high-performing, flexible, artificial intelligence-based mobile applications that remain effective on handheld devices.
  • Offline Language Translation: Qwen2.5 can be used for translations in various translation apps, which would enable those who are using translation services during their travels where the connection may be very poor.
  • Personalized On-Device Assistants: As the capabilities of instruction following and dialogue generation improves, Qwen2.5 can support the more complex on mobile device virtual assisting environments, often comprehending multiple commands and user preferences.
  • Personalized Code Learning Assistants: Qwen2.5 Possessing the knowledge in programming languages, Coders can manage the platforms for code interactive learning, targeting at the specific learning preferences of each user and offering immediate feedback during the coding tasks.
  • Solving Complex Mathematical Problems in Multiple Languages: Qwen2.5-Math has various language support that can help to retrieve information and collaborate in mathematics for researchers from different countries.
  • Developing Accessible Math Learning Resources: Qwen2.5-Math’s capability of producing an explanation further enhances the creation of math learn material which can be understood by various student with learning disabilities makes mathematics more friendly.

The above use cases illustrate how Qwen2.5 can be an all-purpose and highly sophisticated and usable system, extendable to improve various applications in many fields.

Architecture and Design of Qwen2.5

Qwen2.5, the transformer based architecture comprises of components such as Rotary Position Embeddings (RoPE), SwiGLU activation non-linearity, and RMSNorm for stable learning. RoPE encodes the absolute positional information into the rotation matrix, and introduces an explicit relative position dependency into the self-attention construction. In addition, it adopts the attention mechanisms with QKV bias that makes the model able to manage the weight of different words in a sequence and leads to making the model better.

In addition to these, Qwen2.5 has improvements for speed and for dealing with long sequences. During inference, techniques such as Grouped Query Attention (GQA) improve the efficient use of the Key-Value cache reducing the complexity of the model. As for other new methods, both Dual Chunk Attention (DCA) and YARN, like Qwen2, are more focused on the efficient contextual comprehension, which is critical for language modeling. YARN enables GQA to decide when not to use the Key-Value cache during inference, thus making the model more efficient, while DCA helps the model process lengthy contexts more readily and effectively.

At last, Qwen2.5 has established the architecture to input lots of outside knowledge to improve its credibility and avoid hallucinations or false assertion. Critically, this design coupled with its exposure to big data enables it to handle long contexts, understand structured information and provide more accurate and context-specific responses. These advances make certain that Qwen2.5 is in a position to deal with complicated jobs in a better manner.

Performance Evaluation

A number of key benchmark assessments were performed demonstrating the performance characteristics of Qwen2.5 compared to other leading models. One of the evaluation criteria relates to the performance of the base language models of the Qwen2.5 especially the Qwen2.5-72B. The performance of Qwen2.5-72B on a variety of tasks as shown in below table, including general language comprehension (MMLU, MMLU-Pro), rationality (ARC-C, Winogrande), mathematics and scientific facts (MATH, GSM8K), and programming (HumanEval, MBPP). For instance, it is clear that the availability of Qwen2.5-72B leads to better results in comparison with the Qwen2-72B in most of the experiments; the greatest improvement is gained in the area of general, mathematic, and coding tasks. This signals a definite enhancement of the techniques used in knowledge representation, problem solving and code generation. Furthermore, Qwen2.5-72B acquires comparable accuracy as does Llama-3-405B, yet using one-fifth as many parameters, marking high efficiency.

The performance of base models especially Qwen2.5-72B on a variety of tasks
source - https://qwenlm.github.io/blog/qwen2.5/

Additional evidence is the instruction-tuned models demonstrating Qwen2.5’s abilities in which the model is optimized for following instruction and dialogue. The results (shown in below table) of Qwen2.5-72B-Instruct are compared with other instruction tuned models such as Llama-3.1-70B-Instruct and Mistral-Large2 Instruct. Qwen2.5-72B-Instruct demonstrates exceptional performance, exceeding even the larger Llama-3.1-405B-Instruct in critical tasks such as mathematics (MATH: 83.1) In the context of an unscripted natural human-like dialogue, coding the test (LiveCodeBench: 55.5) and the responsiveness of the suggested dialogues to human preferences (Arena-Hard: 81.2). This brings out the functionality of Qwen2.5’s instruction tuning and a realisation of the bot’s high performance in intricate human like tasks.

Comprehensive results from instruction-tuned versions across various benchmarks
source - https://qwenlm.github.io/blog/qwen2.5/

The tests of other Qwen2.5 variants such as Qwen2.5-14B Qwen2.5-32B and Qwen2.5-7B have also been made. In MMLU and BBH models, these benchmarks surpass competitors of similar or even greater scale consistently These tests are general language understanding MATH, HumanEval, and MBPP. These results reaffirm that Qwen2.5 offers a sound performance of a model that, while not especially large, can work under limited capacity conditions. In addition, the evaluations include the interface’s multi-lingual facility, coding efficiency and effectiveness and mathematical oriented tasks which, again, indicate enhanced over previous versions and similar models of Qwen2.5.

How to Use Qwen2.5?

Regarding usage, one can use GitHub, Hugging Face, ModelScope which already host Qwen2.5. Instructions for local deployment and usage are available in the Qwen2.5 repository in GitHub. The Qwen2.5 Collection is available on Hugging Face and provides different model forms and the functionalities of such forms. For deployment, or getting an inference on the model, ModelScope is quite useful. Those are more detailed options for running Qwen2.5 – fully locally, and using the frameworks such as llama.cpp and Ollama, while the links to online demos give remote glimpses of Qwen2.5. In addition to that, the model is open-source, and its licenses allow you to utilize it in the commercial field.

Conclusion

Qwen2.5 provides solutions that are durable in the best way possible in many applicable field. Due to its flexibility in dealing with large textual data, data structures and multiple languages, It is a very useful tool both for programmers and scientists. Comparing to the existing difficulties or problems, it is possible to note that with the help of Qwen2.5, new opportunities and different ways of innovations in various branches and spheres can be reached.


Source
Blog Website: https://qwenlm.github.io/blog/qwen2.5/
LLM Blog Website: https://qwenlm.github.io/blog/qwen2.5-llm/
GitHub Repo: https://github.com/QwenLM/Qwen2.5
Hugging Face Model collection: https://huggingface.co/collections/Qwen/qwen25-66e81a666513e518adb90d9e
Base Model research paper: https://arxiv.org/pdf/2407.10671


Disclaimer - This article is intended purely for informational purposes. It is not sponsored or endorsed by any company or organization, nor does it serve as an advertisement or promotion for any product or service. All information presented is based on publicly available resources and is subject to change. Readers are encouraged to conduct their own research and due diligence.

Friday 20 September 2024

C4AI Command R+: Multilingual AI with Advanced RAG and Tool Use

Presentational View

Introduction

Retrieval-Augmented Generation (RAG) offers the perfect blend of retrieval and generation models to provide rich contextually-generated responses based on other sources of information. This approach is gradually improving by offering more accurate and related data retrieval and using that for creating better and more precise outputs. These enhancements are evident in C4AI Command R+ which is designed to produce responses literally based on document excerpts provided with citations which refer to the specific source of the information. Multi-step tool use on the other hand enables the models to perform a number of operations where the result of one step is utilized in the subsequent steps, which enables models to handle more complex tasks and processes in real-world cases and thus increasing versatility. C4AI Command R+ shines in this aspect as this model has been developed to plan and perform sequences of actions with various tools including a simple agent.

Who Developed the C4AI Command R+? 

The C4AI Command R+ model was designed by Cohere which is a start-up company that focuses on large language models in the business domain. The model was developed with inputs from Cohere's team of specialists. Cohere's aim is to provide language AI technology to the developers and large-scale enterprises to create new-age products and gain commercial benefits. ; Cohere For AI, a newly established research-oriented section of the company, also played a significant role in this model's creation. 

What is C4AI Command R+? 

C4AI Command R+ is Cohere's recent large language model designed for conversational engagement and continuous context involvement. It is most effective in the mixed RAG and multiple tool execution scenarios making it a perfect fit for high tier applications. We can even refer it as 'Command R+'.

Model Variants 

C4AI Command R+ comes in a few key variants. 'command-r-plus-08-2024', released in August 2024, is an updated version of the original 'command-r-plus' with enhanced tool use decision-making, instruction following, structured data analysis, robustness to non-semantic prompt changes, ability to decline unanswerable questions, and execute RAG workflows without citations, along with significant improvements in throughput and latency. It serves as an alias for 'command-r-plus-04-2024'. The differences lie in their performance optimisations, feature updates, and release timelines, with 'command-r-plus-08-2024' representing the most recent model so far.

Model Variants - Command R+
source - https://docs.cohere.com/docs/command-r-plus

Key Features of C4AI Command R+

  • Longer Context Length: Conveys up to 128k tokens as it facilitates the computation of greatly comprehensive and sequentially interdependent interactions.
  • Multilingual Capabilities: It is trained on 23 languages and it is evaluated in 10 languages. Besides, the overall usability is Optimized for multiple languages that are English, French, Spanish, Italian, German, Brazilian Portuguese, Japanese, Korean, Arabic and Simplified Chinese.
  • Cross-Lingual Tasks: It can do things that include but are not limited to translation and question and answer sessions in various languages.
  • Retrieval Augmented Generation (RAG): Creates replies with references in accordance with a context.
  • Multi-Step Tool Use: Interfaces with other external machinery in order to complete a number of different tasks.
  • Massive Parameter Size: This model has 104 billion parameters which can help perform complex tasks with high accuracy.

Capabilities and Unique Use Cases

C4AI Command R+ performs exceptionally in understanding at depth, paraphrasing and summarizing as well as in providing accurate answers to questions, and specific purpose based on its utilization in businesses, industries and corporate such as in customer relations, content generation, and data analysis wherein new execution may lead to higher productivity automation. Here are few use cases:

  • Multilingual Legal Contract Analysis: Provides an overview and highlights of legal contracts in different languages for global businesses as well as provides an identification of the potential risks of the clauses included in these contracts.
  • Cross-Lingual Literary Translation: Translates texts from one language into another, maintaining the original style and purpose of the work, as well as identifies themes and manners of composition in literature.
  • Real-Time Meeting Summarization: Real-time simultaneous interpretation into preferred languages, taking notes, providing summaries, and actionable follow-up points helpful to the international teams for meetings.
  • Generative Code Documentation: Supports various languages code and generate document in that particular language along with information about code, its summary and tutorial to help development teams that work on multiple languages.
  • Cross-Cultural Marketing Campaigns: Using cultural data and language differences to develop marketing-related commercials and other related products to assist global organizations growth.

Architecture and Advancements of C4AI Command R+

C4AI Command R+ is designed on an optimized transformer model that is a type of deep learning that was intended for use in the processing of sequential data such as text. This architecture is designed for convergence so that the best results can be achieved and this is very much useful where there is a need for high order arithmetic such as language modelling and text generation. In terms of text generation, this model follows an autoregressive model where the next word is predicted based on the previously generated words. This way the generated text is meaningful and stays in context because each word depends on the context as given by the preceding words.

Some of the developments include the supervised fine-tuning and preference training. These characteristics are achieved during supervised fine-tuning and preference training to match human-perceived helpfulness and safety behavior. This entails feeding the model with huge volumes of text and code data, and then tweaking its responses in line with, feedback from people. This process also help to make sure that C4AI Command R+ gives out responses which are correct, secure and conforming to human ethic.

Basic steps involved in multi-step tool use
source - https://docs.cohere.com/docs/multi-step-tool-use

There is also an improvement of multi-step tool use in the C4AI Command R+ version. Unlike single-step tool use where a model is allowed to request for external tools just but once, multi-step tool use allows the model to strategize on the flow of actions to complete his or her program using as many tools as possible. This capability greatly extends the applicability of the model and opens up a vast number of opportunities for its usage which now can cover complex real-world tasks that often involve decision making, multi-step computation, and if necessary, interaction with external systems. Furthermore, C4AI Command R+ integrates the Accelerated Query Attention (AQ+), an extension of the original GQA which improves the speed of the attention mechanism within the transformer model while keeping the model’s ability to form responses and process and analyze information at the same rate as the original model.

Performance Evaluation

Command R+ perform well in RAG tasks that contrast with other models in the scalable market category. In the comparison against a number of benchmark models, including the GPT-4 and Claude 3 Sonnet, as well as in the samples of head to head human preference evaluation of writing tasks, Command R+ scored higher in such aspects as text coherence and non-redundancy of citations. This evaluation employed a custom test set of 250 diverse documents with ornate summarization requisitions. Command R+ leveraged the RAG-API, while the baselines were proved to be heavily prompt-engineered, which demonstrated the usefulness of Command R+ in real-world commercial scenarios.

Human head-to-head preference results using a holistic grading scheme combining text fluency, citation quality, and overall utility
source - https://cohere.com/blog/command-r-plus-microsoft-azure

Another major experiment involved Tool Use features where it was necessary to automate business processes. Microsoft’s ToolTalk (Hard) benchmark was employed to assess Command R+ as well as Berkeley’s Function Calling Leaderboard (BFCL). Single-turn function calling as well as conversational tool usage were also excellent as depicted by the model. In the ToolTalk benchmark, high success rates were achieved in the recall of tool calls and in the prevention of unwanted actions in the Command R+ test In the BFCL evaluation, acceptable function success rates were achieved in Command R+ across the various subcategories of executables.


source - https://cohere.com/blog/command-r-plus-microsoft-azure

Other assessments included a multilingual support aspect, as well as tokenization effectiveness. In both FLoRES and WMT23 tests, Command R+ performed very well in translation tasks in 10 selected business languages. The model’s tokenizer was also highly efficient in data mincing of non-English text with the level of cost reduction reaching to 57% compared to others in the market. This efficiency was observed particularly so in non Latin script languages where the Cohere tokenizer generated fewer tokens compared to the string representation of the same text.

How to Access and Use C4AI Command R+

C4AI Command R+ provides options in deciding how one can have easy access and use the software. It is provided on Hugging Face where one can try it in a web-based environment or pull the model weights to run locally. The model can also be utilized by the Cohere API where the setting and the usage instructions are highlighted on the Hugging Face model card. C4AI Command R+ is a free software and released under the CC-BY-NC, permitting non-commercial use with proper attribution.

Limitations And Future Work

  • Language Support Limitation: Some of the more complex functionalities such as Retrieval-Augmented Generation (RAG) and multi-step tool use are currently only supported in English.
  • Context Window Issues: Writing prompts between 112 k and 128 k tokens lead to poor quality of the generated work due to performance issues on longer input.

Future Focus Areas are Increasing language coverage for additional features and addressing the issues of the context window to further increase applicability of the model for a non English speaking audience.

Conclusion

Among all C4AI commands, C4AI Command R+ is the unique solution that can help businesses that focus on the implementation of AI in complex processes. The ability to handle RAG, multi-step tool use, and multilingual further makes it a tool of great value in managing extended-context tasks and workflow interactions. This not only promotes productivity but also expands several doors of opportunities of enterprises’ applications.


Source
Website: https://docs.cohere.com/docs/command-r-plus
Hugging Face C4AI Command R+ weights: https://huggingface.co/CohereForAI/c4ai-command-r-plus-08-2024
Hugging Face Predecessors: https://huggingface.co/CohereForAI/c4ai-command-r-08-2024
Performance Evaluations : https://cohere.com/blog/command-r-plus-microsoft-azure


Disclaimer - This article is intended purely for informational purposes. It is not sponsored or endorsed by any company or organization, nor does it serve as an advertisement or promotion for any product or service. All information presented is based on publicly available resources and is subject to change. Readers are encouraged to conduct their own research and due diligence.

Tuesday 17 September 2024

Reader-LM: Efficient HTML to Markdown Conversion with AI

Presentational View

Introduction

Markdown is a language that is used for formatting content. Users able to format text using plain text  which later shall be converted to HTML format. A well formatted use of Markdown files is important in order to ensure that the files are easy to read and well organized. It makes the handling of content much easier especially where it is being shared across different groups and teams or when the same content is required to be posted on different social media platforms. There are now several ways of converting HTML to Markdown including HTML2Markdown, Turndown, and even online tools.

Some of the main issues are complex HTML structure, problem in format preservation and noise in HTML. Reader-LM has been developed to flux these problems by applying AI to enhance and full auto the conversion. This means that through AI, enhancements have been made to be able to create models such as Reader-LM, which can easily convert HTML to Markdown as it comprehends and parses the content better.

Who Developed Reader-LM?

Reader-LM is built by Jina AI — the company whose mission is to democratize Artificial Intelligence and make them open for everyone through Open-Source and Open-Science. The model was based on Jina Reader and contributed by different AI researcher and developers. The goal for Reader-LM was to build a fast and cheap tool that takes such raw, noisy HTML and converts it into clean Markdown. The primary purpose of this model is to make the process of converting the content simpler and at the same time enhancing the quality of the converted content.

What is Reader-LM?

Reader-LM is a suite of small language models for converting HTMLs into Markdowns. These models are developed to recognize the structure of HTML tables and generate neat and well-formatted Markdowns.

Model Variants

  • Reader-LM 0.5B: A new release of better optimized, less powerful version intended for simple tasks.
  • Reader-LM 1.5B: A version with larger size that allows for additional features focused to parse more complicated structure of HTML tags.

This means that these variants are tailored to suit the different needs of users, 0.5B model has efficiency at the center. while 1.5B model is more powerful and have higher processing capabilities than the other one.

Reader-LM models' specifications
source - https://jina.ai/news/reader-lm-small-language-models-for-cleaning-and-converting-html-to-markdown/

Key Features of Reader-LM

  • Multilingual Support: It has provision for multilingual support and this makes it ideal for use in different countries.
  • Long-Context Handling: Effective in handling long documents of up to 256K tokens of context length ; particularly HTML documents.
  • Efficient Performance: Originally intended for optimization on edge devices with less than 1 billion parameters.
  • Selective Copying: Concentrate on the transfer of selected HTML content to Markdown without losing much of the information.

Capabilities/Use Cases of Reader-LM

  • Content Conversion: Translates raw HTML of web pages and cleans it to Markdown format for documentation and content management.
  • Data Cleaning: Eliminates certain unwanted components such as headers, footers, and sidebars giving out a cleaner input.
  • Real-World Examples: Other than documentation, blogging, and content management system where clean Markdown is desirable, Reader-LM also has other real time utilization. For instance, it can be applied to build clean feed readers by parsing the raw HTML from various sources and translating them to structured Markdown which are easier to summarise and to identify topics. Due to its information extraction and structuring features, it can be applied in enhancing the quality of web for the visually impaired, developing individualized feeds and constructing content feeds, and extracting data for market research.

How Reader-LM Works

This is unlike most other reader-LM that uses a specific method to transform raw HTML to clean Markdown. Thus, instead of conventional approaches such as headless Chrome, Readability, regex, and Turndown library, Reader-LM makes use of a small language model (SLM) in this regard. This SLM is especially designed to learn how to work with the data input in the HTML format and output the Markdown format without the need for extensive use of rules that define the conversions. The following figure graphically illustrates this transition from a complex linear model that incorporates several stages, to the efficient model of SLM.

Illustration of reader-lm, replacing the pipeline of readability+turndown+regex heuristics using a small language model.
source - https://jina.ai/news/reader-lm-small-language-models-for-cleaning-and-converting-html-to-markdown/

Architecture/Design and Workflow

This SLM has been the key to Reader-LM’s architectural design for dealing with the challenges of converting HTML to Markdown. The HTML to markdown translator is trained on a huge training corpus of HTML and Markdown samples which helped the model learn the full features of HTML, Markdown and their interactions. Whenever a new HTML input is passed to Reader-LM, it moves from left to right and computes the most likely Markdown tokens according to the training set as well as the input HTML. This way, Reader-LM is able to retain the layout and content of the HTML whilst providing the reader with clean, properly formatted Markdown.

Uniqueness in Training Strategy

The training strategy adopted for Reader-LM is very important for it to be effective. This model in particular goes through a two-stage training process, namely on the ‘short-and-simple’ HTML as well as on the ‘long-and-hard’ HTML. It also helps the model to first learn basic concepts of HTML to Markdown then slowly it is trained with real world and long HTML documents. Further, the developers have used some strategies towards the difficulties in the degeneration and the training when the inputs are long such as contrastive search, repetition stop criteria and chunk-wise model forwarding. Combined with the selective copying and long-context policies, these strategies make for a high efficacy of Reader’s LM to convert HTML to Markdown.

Performance Evaluation of Reader-LM

To assess the performance of Reader-LM, the developers benchmarked it against Large Language Models such as GPT-4 and Gemini-1.5, measured by using the four metrics; Recycle Option for Ubiquitous Generation and Evaluation of reference summaries, TER and WER. The ROUGE-L evaluation computes the number of overlapping tokens which provides a measure of the model’s performance in capturing the content. TER, intended to assess hallucination, quantifies the rates of generated Markdown tokens which are unique to the generated output but were not present in the original HTML. WER which is often used in tasks such as OCR targets the word sequence and then gives a breakdown of insertions, deletions and substitution in a detailed manner in order to compare the output Markdown to the actual Markdown that is expected.

Quantitatively Evaluation
source - https://jina.ai/news/reader-lm-small-language-models-for-cleaning-and-converting-html-to-markdown/

Reader-LM, particularly the 1.5B model, offered very promising outcomes, with the highest score, 0.72 of ROUGE-L, as well as the lowest WER which was 1.87 and TER 0.19, which proves that the 1.5B model outperforms much larger ones in its aim to accurately translate HTML into Markdown with the lowest levels of errors and hallucinations can be considered.

Qualitative Study Results
source - https://jina.ai/news/reader-lm-small-language-models-for-cleaning-and-converting-html-to-markdown/

In addition, there was a qualitative analysis that received a visual analysis of Markdown-language outcoming from 22 HTML sources that represent diverse language and website types. This evaluation considered four key dimensions: The first four skills include header extraction, main content extraction, rich structure preservation, and Markdown syntax usage, all rated from 1 to 5. The study highlighted Reader-LM-1.5B achieves high awareness in structure preservation and Markdown standard syntax while comparing with  it's competitors .  It also always can not outperform the Jina Reader API , but it was comparable to bigger models, like Gemini 1.5 Pro.

How to access and Use Reader-LM

Reader-LM is now released to Hugging Face where it is possible to download the latest 0.5B and 1.5B parameter models. For reading the inputs locally using Reader-LM, transformers need to be installed and then the steps listed on the Hugging Face model page of the selected version have to be followed. For the followers, who would rather use an easily understandable approach, there is a link to the Colab notebook to play with the model. Reader-LM is open-source and licensed under the CC BY-NC 4. 0 license. One has to reach out to Jina AI for commercial access.

Limitations and Future Work

Reader-LM is proved to be effective in practice yet it can experience difficulties while dealing with highly nested html structures or the information which contains a lot of noise. Future research could focus on enhancing the capacity for handling of such cases of patient management. Also, it is multilingual to a certain extent, but there is a possibility for development in this direction.

Conclusion

Reader-LM is a considerable improvement in the process of converting HTML to Markdown in comparison with methods that primarily rely on simple pattern matching and heuristics. Hence, Reader-LM that leverages SLMs will offer a more efficient and arguably more accurate solution. By this advancement it becomes easier both in the usage of web content as well as the creation and management of the content hence bringing an improvement in the organization of the environment in the internet.


Source
Jina AI website: https://jina.ai/
reader lm post: https://jina.ai/news/reader-lm-small-language-models-for-cleaning-and-converting-html-to-markdown/
Hugging Face reader-lm 0.5b: https://huggingface.co/jinaai/reader-lm-0.5b
Hugging Face reader-lm 1.5b: https://huggingface.co/jinaai/reader-lm-1.5b
google Colab : https://colab.research.google.com/drive/1wXWyj5hOxEHY6WeHbOwEzYAC0WB1I5uA#scrollTo=lHBHjlwgQesA


Disclaimer - This article is intended purely for informational purposes. It is not sponsored or endorsed by any company or organization, nor does it serve as an advertisement or promotion for any product or service. All information presented is based on publicly available resources and is subject to change. Readers are encouraged to conduct their own research and due diligence.

Monday 16 September 2024

How Open Source Yi-Coder-9B-Chat Beats Larger Code Models

Presentational View

Introduction

Code models have greatly progressed especially with the use of large language models (LLMs) improving code generation, completion as well as debugging. These models have evolved from statistical to deep learning, more specifically that of the transformer model with remarkable results. These advancements can further be assisted by the small language models (SLMs) that are efficient, and can be scaled. They yield high performance with less computational expenses and therefore they are suitable for wider use.

However, there seem are challenges that still limit research in the field such as getting high quality data, tackling interpretability issues, or even computational complexity. Yi-Coder-9B-Chat solves these problems by using high-quality training data and adjusting model structure for improved effect. This is in line with the general developments made in other fields of AI with an intention of coming up with better models.

Yi-Coder-9B-Chat developed by '01. AI', a company, specializing in the development of the AI technologies. Some of the professionals that are involved in this particular model are, the ML/AI engineers, NLP, and s/w engineers. The purpose of designing Yi-Coder-9B-Chat was to design a strong but effective code language model that will be beneficial for developers in multiple coding chores in order to enhance the code quality without consuming much time.

What is Yi-Coder-9B-Chat?

Yi-Coder-9B-Chat is a novel open code language model which is optimized for generation, completion and debugging of codes. It belongs to Yi-Coder series and as the name suggests, there are variations within this series with respect to the parameters they come with.

Key Features of Yi-Coder-9B-Chat

  • Long-Context Understanding: It has a maximum context length of 128K tokens which makes it possible to accommodate large code bases.
  • Multi-Language Support: Virtually supports 52 major programming languages such as Java, Python, Javascript, C++, etc.
  • High-Quality Training Data: Taught on 2.4T high quality tokens of code obtained from GitHub and CommonCrawl.
  • Efficient Inference: Enabling efficient inference and variable training optimum networks with both wide applicability and diverse architecture designs to modern AI services.
  • Parameter Size: 9 billion parameters which are optimized for performance and efficiency.

Capabilities/Use Cases of Yi-Coder-9B-Chat

  • Code Generation and Completion: Frequently excels at generating and optimizing the code fragments in the numerous programming languages.
  • Debugging and Translation: It’s highly competent in correcting code and in moving it from a language to the other.
  • Project-Level Comprehension: May write code at the project level, large complex software systems appropriate for the development of.
  • Code Modification: Good at logical reasoning exercises such as rectification, translation, switching of languages, and improvement of codes.
  • Real-World Examples: Illustrative in actual coding competitions in popular coding platforms such as leetCode, Atcoder etc.

Optimized Architecture and Training of Yi-Coder-9B-Chat

Yi-Coder-9B-Chat is developed from a transformer based model to support long-context comprehension and inference. The model is built based on the decoder-only Transformer, which is improved by the modification called as Grouped-Query Attention or GQA. Furthermore, in the post-attention layer, SwiGLU activation is applied and Rotary Position Embedding (RoPE) with adjusted base frequency for processing input up to the maximum of 200K tokens. These choices in architecture allow Yi-Coder-9B-Chat to work with large codes hence making it useful to developers.

Yi’s pretraining data cleaning pipeline.
source - https://arxiv.org/pdf/2403.04652

The training process of Yi-Coder-9B-Chat was pre-trained on a huge dataset of 2.4 trillion high-quality tokens, some of them obtained from GitHub repositories, and tokens derived from source code selected from CommonCrawl. As illustrated in the above figure which shows that the pretraining data are pretty clean, the heuristic-based rule filters, learned filters, and the cluster-based filters work in tandem to guarantee high-quality data. After pretraining the model is further trained on a selection of less than 10000 multi-turn instruction-response dialog examples which are selected for quality rather than quantity. This broad training approach helps Yi-Coder-9B-Chat to remain on par with other large models while at the same time keeping up the efficiency.

To improve its performance, there are various techniques which are incorporated in Yi-Coder-9B-Chat. It is also scalable for using 4-bit model quantization and 8-bit KV cache quantization. This also helps to reduce the memory usage of the applied algorithms. In dynamic batching the response time is faster. PagedAttention assists for effective memory management during inference. It also has a component named as the Responsible AI Safety Engine (RAISE). This makes sure that pretraining, alignment as well as deployment is done safely. It can therefore help solve problems that range from environmental problems to cyber security. These are the characteristics that make Yi-Coder-9B-Chat smart and fast. Therefore it is a responsible solution for various coding related tasks.

Performance Evaluation of Yi-Coder-9B-Chat

As shown in below figure, Yi-Coder-9B-Chat is found to perform well in LiveCodeBench. This platform assesses learners’ programming skills using live problems from LeetCode, AtCoder, and CodeForces. Consequently, to eliminate data contamination, they employed problems of January to September of 2024. Yi-Coder-9B-Chat had a 23.4% pass rate. They outperformed more extensive models such as DeepSeek-Coder-33B-Instruct which scored 22. 3% and CodeLLama-34B-Instruct which was at 13. This makes it the only model with less than 10B parameters to achieve more than 20% pass rate.

LiveCodeBench
source - https://01-ai.github.io/blog.html?post=en/2024-09-05-A-Small-but-Mighty-LLM-for-Code.md 

The results showed that Yi-Coder-9B-Chat performed overwhelmingly well in basic code generation and reasoning tests. Its performance on HumanEval, MBPP and CRUXEval-O (see below table) was impressive. It had an 85. 4% for pass rate of HumanEval and the 73. 8% on MBPP. It did so with other code LLMs. Also worth pointing out is the fact that a year prior to the presented work, Yi-Coder-9B – the first open-source code LLM – achieved more than 50% accuracy on CRUXEval-O. This would explain its good pass rates in different coding exercises.

Benchmark results on HumanEval, MBPP and CRUXEval-O
source - https://01-ai.github.io/blog.html?post=en/2024-09-05-A-Small-but-Mighty-LLM-for-Code.md 

Yi-Coder-9B-Chat also performed well on code editing (CodeEditorBench), code completions (CrossCodeEval) and long-context modeling with up to 128K tokens. In math reasoning it got seventy percent. 3% of program-assisted mathematical reasoning proficiency according to program-aided benchmarking. It outperformed larger models. The use of the model in achieving these results has demonstrated its viability in terms of versatility and effectiveness. That makes Yi-Coder-9B-Chat an efficient tool to facilitate software development. It is remarkable though it still remains well under 10 billion parameters.

Comparative Analysis with Competing Models

Some differences stand out between Yi-Coder-9B-Chat, DeepSeek-Coder-33B-Instruct, and CodeLLama-34B-Instruct. Yi-Coder-9B-Chat is based on an optimized transformer model with 9 billion parameters. It can handle a maximum context length of 128K tokens. It is trained on 2.4 trillion high-quality tokens from GitHub and CommonCrawl. This makes it very efficient when working with large code repositories and various programming languages.

DeepSeek-Coder-33B-Instruct, in contrast, uses a transformer architecture with 33 billion parameters. It is trained on 2 billion tokens of instruction data. It provides project-level code completion and infilling with a window size of 16K. The model is trained on a dataset containing 2 trillion tokens, which are 87% code and 13% natural language. This makes it very flexible and highly scalable, easily adapting to any task requested by the user. CodeLLama-34B-Instruct, part of the Code Llama family, is a general-purpose code synthesis and understanding tool. It focuses on code completion and infilling. It ranges from 7 billion to 34 billion parameters and is designed for code synthesis and understanding tasks.

Yi-Coder-9B-Chat is an attractive option for developers seeking efficient and effective coding solutions. It is especially suitable for tasks requiring long-context understanding and support for multiple programming languages. In contrast, DeepSeek-Coder-33B-Instruct excels in project-level code completion and infilling. CodeLLama-34B-Instruct provides general code synthesis and understanding capabilities. Each model has its strengths, making them suitable for different use cases and scenarios.

How to Access and Use Yi-Coder-9B-Chat?

The code of Yi-Coder-9B-Chat can be accessed through the Hugging Face repository as well as the GitHub repository. Importantly, it is rather simple to run the model locally using transformers. In any case, it is possible to find it via online services. About the equipment, the two platforms offer elaborate instructions with regard to set up as well as general use. The model is free to use, and the developers have used Apache 2.0 license. Interested users can find all relevant links for this model at the end of this article.

Limitations and Future Work

Yi-Coder-9B-Chat is impressive. It may struggle with large and complex project or certain narrowly defined tasks. Some of them could be made to make it more flexible in the future. Perhaps, the formalization of its training data could help it handle more scenarios.

Conclusion

In general, Yi-Coder-9B-Chat is very useful in coding assignments. Thus, using this model in code development tasks offers its functionalities and performance as benefits to developers. It also comprehends long contexts and can accommodate multiple programming languages. This makes it a great asset to AI and coding and other technology-related communities.


Source
01.AI Blog: https://01-ai.github.io/blog.html?post=en/2024-09-05-A-Small-but-Mighty-LLM-for-Code.md 
Hugging Face Weights: https://huggingface.co/01-ai/Yi-Coder-9B-Chat
Research paper: https://arxiv.org/abs/2403.04652
research document: https://arxiv.org/pdf/2403.04652
Base model: https://github.com/01-ai/Yi-Coder


Disclaimer - This article is intended purely for informational purposes. It is not sponsored or endorsed by any company or organization, nor does it serve as an advertisement or promotion for any product or service. All information presented is based on publicly available resources and is subject to change. Readers are encouraged to conduct their own research and due diligence.

Friday 13 September 2024

xLAM: Enhancing AI Agents with Salesforce’s Large Action Models

Presentational View

Introduction

An AI agent refers to an entity that transacts with and independently observes its surroundings while making decisions on how to act in a given context to accomplish certain tasks. They are intended to do work on their own with minimal interference from human beings, thanks to complex formulas. There has been a great improvement in AI agents in the recent past, for instance, they can perform complicated tasks like the processing of natural language, decision making and real time problem solving. These are attributed to utilization of the large language models (LLMs), and reinforcement learning mechanisms. However, the use of AI agents in autonomous systems encounter problems such as environment generalization, decision stability, inter-operability among others. The xLAM models attempt to solve these problems by enabling better function calling and increasing the robustness of the learning models in different settings.

Who Developed xLAM?

The xLAM models are the creation of Salesforce AI Research – one of the world’s foremost organizations for AI research. The very idea of making xLAMs was to design a set of models complementing the functioning of actions by AI agents. Salesforce AI Research was focused to enhance the integration of AI in different operational systems and models, which in turn, had a better efficiency rate.

What is xLAM?

Namely, xLAM means ‘Large Action Models’. These models are meant to improve decision making and map user’s intent into actionable steps while operating in the world.  Unlike traditional LLMs, xLAM focuses on function-calling and real-time task execution. It is best used for function calling and AI agents and several variations are proposed depending on the target application domain.

Model Variants

The xLAM family consists of a number of versions intended to be deployed in specific areas of application. The xLAM-1B model is relatively small and lightweight, making the module favoured for mobile use. Specifically, the xLAM-7B device is intended for academic usage with a limited measure of GPU allowance. The xLAM-8x7B is an 8x7B mixture-of-experts model , suitable for industrial processes, with reasonable latency, resource use, and fine performance. The xLAM-8x22B is a high combination OF model which has a large number of parts for high performance tasks.

Overview of xLAM model series
source - https://arxiv.org/pdf/2409.03215

Key Features of xLAM

  • Optimized for Function-Calling: As described above, xLAM models are specifically developed for function call operations which empower an xLAM model to act.
  • Scalable Training Pipeline: The models are trained using a scalable pipeline that apply and expand data unification across different domains in order to improve the models’ generality.
  • Real-Time Task Execution: xLAM models are designed to focus on real-time task processing where such tasks as update of the CRM system, answering customer’s questions and changing the sales pipeline can be done without any involvement of the person.
  • Enhanced Decision-Making: These models enhance choice making by reflecting the user’s goals into wanted conduct within the context of the global environment.
  • Competitive Performance: In xLAM models, we can observe that such approach provides equal or higher performance than the other agent benchmarks including the Function-Calling Leaderboard of Berkeley.
    An overview of xLAM model performances on the Berkeley Function Calling Leaderboard v2 (cutoff date 09/03/2024).
    source - https://arxiv.org/pdf/2409.03215

Capabilities/Use Case of xLAM

  • Autonomous Task Handling: xLAM models can independently perform the many-layered activities that include the initiation of processes within other software systems.
  • Customer Support Automation: It is possible to employ these models for successfully answering clients’ questions on various aspects of support. Example: Somewhere in Customer Service workflows where basic customer questions and inquiries are dealt with through a response by an automated manner but complex questions are forwarded to the human agents.
  • Sales Pipeline Management: xLAM models can cater for sales pipeline procedures to allow easy tracking and follow up of the sales leads. Example: By tracking leads and following up on the leads through emails, updating of sales records in a real-time manner streamlining the sales operation.
  • Generalizability Across Environments: The models are expected to operate effectively in varied contexts hence suitable for dynamic use cases. Example: Flexibility in the environment to fit various business processes and operation requiring a close integration with the existing structures.

How xLAM Models Work

The xLAM models work in a proper sequence which includes data preparation. This means to merge, verify, and enrich data patterns, so as to develop a solid and diverse dataset. This step is very important so that the model be able to perform better in nearly all the tasks required. After the data is prepared, the model is trained by the means of supervised fine-tuning as well as Direct Preference Optimization. This training can accommodate both small and big models hence it can be used in small training scenarios as well as large scale training.

Overview of the data processing, training and evaluation of xLAM.
source - https://arxiv.org/pdf/2409.03215

These models function on the cultivating data from various settings and normalizing the same. This leads to the creation of a generic data loader which is specifically optimized for the training process. The data processing steps include the steps such as data unification, data augmentation, and data quality check. These are task description and pre-specified assets, examples, questions, and actions in single format of a standardized task description. The use of this format facilitate the application of different enhancement techniques. The models are also designed as a mixture-of-experts in order to construct an efficient internal organization with a proper share of performance and resource demands.

After training, the models are evaluated based on several standards such as Webshop, ToolQuery, ToolBench, and the Berkeley Function-Calling Benchmark. This process also contains a feedback loop in which tips derived from these tests aid a constant enhancement of data quality. This makes sure that the models become better and better and thus enhance better performance in various tasks. By training the models with a lot of function/API calling data from open environments and in house simulators, the models acquire better strategies in accomplishing advanced tasks and therefore a crucial tool in improving AI.

Performance Evaluation

It was observed that the performance of the xLAM models rose up to the occasions during the various benchmarks as presented below. In the given Webshop environment (as shown in below table), organization of xLAM-7b-r produced the highest success rate of 41.4% while fending off other general models that are currently in circulation in the market such as GPT-4, Claude2 among others. As for the ToolQuery test case, we see that xLAM-8x7b-r and xLAM-8x22b-r rank second with the score of 68 % success. 3 % higher than even the much bigger Mixtral-8x22B-Instruct model.

Testing results on Webshop and ToolQuery
source - https://arxiv.org/pdf/2409.03215

In the Berkeley Function-Calling Benchmark v2, concerning which is a much popular benchmark, four xLAM models were positioned in the top twenty. The best model was xLAM-8x22b-r model which had an overall precision of 87.31 % and that is a record for a 31 % accuracy which is better than both GPT-4 and Claude-3.Even the smallest model, xLAM-1b-fc-r, ranked 32nd with a 75.43% success rate, outperforming larger models like Claude-3-Opus and GPT-3.5-Turbo.

Performance comparison on BFCL-v2 leaderboard
source - https://arxiv.org/pdf/2409.03215

These models also performed well in other assessments.  On the ToolBench benchmark, they outperformed TooLlama-V2 and GPT-3.5-Turbo-0125 in all categories and test conditions, and even surpassed GPT-4-0125-preview in some cases. Further, an ablation study on the data augmentation and cleaning processes employed in the xLAM pipeline also produced substantial gains in the different statistics analyzed.

How to Access and Use xLAM

The related xLAM models are on GitHub and Hugging Face. The local use is possible, or they can be introduced into an existing system through API. Additional information on how to set up and use these configurations are included in their respective repositories. They are open-source models, people should use it for research and academies , and therefore can be widely used within open community. Lastly, for those who wants to learn more about this AI model, all of the links are given at the end of this article.

Limitations and Future Work

The xLAM series never incorporate hypothetical scenarios, which may be reflected in the initial data used in most of the studies. It is important to note that although the presented data synthesis framework would be useful in the outlined contexts, it may not encompass all the possible applications. Furthermore, the models are relatively good at out of domain tasks and unseen tools but there is still significant scope of improvement in generalization and adaptability.

The future work could be geared towards the development of the more sophisticated data synthesis concepts and the inclusion of multimodal inputs.  It could also be useful to apply xLAM models in other, more complicated, or variable situations .  Building on the Mixtral Instruct models, future research could develop specialized models for specific tasks, leveraging xLAM’s flexible architecture.

Conclusion

xLAM models make decision-making processes more effective and can also facilitate the execution of some challenging operations, which means that its implementation can be beneficial in many spheres. Hence, through using such architectures to solve present problems, they enable more proficient and optimized AI operations. Due to their openness and competitive performance, such frameworks are inestimable for researchers and developers. 

Source
Salesforce blog: https://blog.salesforceairesearch.com/large-action-model-ai-agent/
Research paper: https://arxiv.org/abs/2409.03215
Research document: https://arxiv.org/pdf/2409.03215
Hugging face model collections: https://huggingface.co/collections/Salesforce/xlam-models-65f00e2a0a63bbcd1c2dade4
GitHub Repo: https://github.com/SalesforceAIResearch/xLAM


Disclaimer 
- This article is intended purely for informational purposes. It is not sponsored or endorsed by any company or organization, nor does it serve as an advertisement or promotion for any product or service. All information presented is based on publicly available resources and is subject to change. Readers are encouraged to conduct their own research and due diligence.

NVIDIA’s Nemotron 70B: Open-Source AI with Enhanced RL

Introduction Advanced learning and reward systems refer to a class of algorithms that can optimize the process of learning through providing...