Pages

Monday, 16 September 2024

How Open Source Yi-Coder-9B-Chat Beats Larger Code Models

Presentational View

Introduction

Code models have greatly progressed especially with the use of large language models (LLMs) improving code generation, completion as well as debugging. These models have evolved from statistical to deep learning, more specifically that of the transformer model with remarkable results. These advancements can further be assisted by the small language models (SLMs) that are efficient, and can be scaled. They yield high performance with less computational expenses and therefore they are suitable for wider use.

However, there seem are challenges that still limit research in the field such as getting high quality data, tackling interpretability issues, or even computational complexity. Yi-Coder-9B-Chat solves these problems by using high-quality training data and adjusting model structure for improved effect. This is in line with the general developments made in other fields of AI with an intention of coming up with better models.

Yi-Coder-9B-Chat developed by '01. AI', a company, specializing in the development of the AI technologies. Some of the professionals that are involved in this particular model are, the ML/AI engineers, NLP, and s/w engineers. The purpose of designing Yi-Coder-9B-Chat was to design a strong but effective code language model that will be beneficial for developers in multiple coding chores in order to enhance the code quality without consuming much time.

What is Yi-Coder-9B-Chat?

Yi-Coder-9B-Chat is a novel open code language model which is optimized for generation, completion and debugging of codes. It belongs to Yi-Coder series and as the name suggests, there are variations within this series with respect to the parameters they come with.

Key Features of Yi-Coder-9B-Chat

  • Long-Context Understanding: It has a maximum context length of 128K tokens which makes it possible to accommodate large code bases.
  • Multi-Language Support: Virtually supports 52 major programming languages such as Java, Python, Javascript, C++, etc.
  • High-Quality Training Data: Taught on 2.4T high quality tokens of code obtained from GitHub and CommonCrawl.
  • Efficient Inference: Enabling efficient inference and variable training optimum networks with both wide applicability and diverse architecture designs to modern AI services.
  • Parameter Size: 9 billion parameters which are optimized for performance and efficiency.

Capabilities/Use Cases of Yi-Coder-9B-Chat

  • Code Generation and Completion: Frequently excels at generating and optimizing the code fragments in the numerous programming languages.
  • Debugging and Translation: It’s highly competent in correcting code and in moving it from a language to the other.
  • Project-Level Comprehension: May write code at the project level, large complex software systems appropriate for the development of.
  • Code Modification: Good at logical reasoning exercises such as rectification, translation, switching of languages, and improvement of codes.
  • Real-World Examples: Illustrative in actual coding competitions in popular coding platforms such as leetCode, Atcoder etc.

Optimized Architecture and Training of Yi-Coder-9B-Chat

Yi-Coder-9B-Chat is developed from a transformer based model to support long-context comprehension and inference. The model is built based on the decoder-only Transformer, which is improved by the modification called as Grouped-Query Attention or GQA. Furthermore, in the post-attention layer, SwiGLU activation is applied and Rotary Position Embedding (RoPE) with adjusted base frequency for processing input up to the maximum of 200K tokens. These choices in architecture allow Yi-Coder-9B-Chat to work with large codes hence making it useful to developers.

Yi’s pretraining data cleaning pipeline.
source - https://arxiv.org/pdf/2403.04652

The training process of Yi-Coder-9B-Chat was pre-trained on a huge dataset of 2.4 trillion high-quality tokens, some of them obtained from GitHub repositories, and tokens derived from source code selected from CommonCrawl. As illustrated in the above figure which shows that the pretraining data are pretty clean, the heuristic-based rule filters, learned filters, and the cluster-based filters work in tandem to guarantee high-quality data. After pretraining the model is further trained on a selection of less than 10000 multi-turn instruction-response dialog examples which are selected for quality rather than quantity. This broad training approach helps Yi-Coder-9B-Chat to remain on par with other large models while at the same time keeping up the efficiency.

To improve its performance, there are various techniques which are incorporated in Yi-Coder-9B-Chat. It is also scalable for using 4-bit model quantization and 8-bit KV cache quantization. This also helps to reduce the memory usage of the applied algorithms. In dynamic batching the response time is faster. PagedAttention assists for effective memory management during inference. It also has a component named as the Responsible AI Safety Engine (RAISE). This makes sure that pretraining, alignment as well as deployment is done safely. It can therefore help solve problems that range from environmental problems to cyber security. These are the characteristics that make Yi-Coder-9B-Chat smart and fast. Therefore it is a responsible solution for various coding related tasks.

Performance Evaluation of Yi-Coder-9B-Chat

As shown in below figure, Yi-Coder-9B-Chat is found to perform well in LiveCodeBench. This platform assesses learners’ programming skills using live problems from LeetCode, AtCoder, and CodeForces. Consequently, to eliminate data contamination, they employed problems of January to September of 2024. Yi-Coder-9B-Chat had a 23.4% pass rate. They outperformed more extensive models such as DeepSeek-Coder-33B-Instruct which scored 22. 3% and CodeLLama-34B-Instruct which was at 13. This makes it the only model with less than 10B parameters to achieve more than 20% pass rate.

LiveCodeBench
source - https://01-ai.github.io/blog.html?post=en/2024-09-05-A-Small-but-Mighty-LLM-for-Code.md 

The results showed that Yi-Coder-9B-Chat performed overwhelmingly well in basic code generation and reasoning tests. Its performance on HumanEval, MBPP and CRUXEval-O (see below table) was impressive. It had an 85. 4% for pass rate of HumanEval and the 73. 8% on MBPP. It did so with other code LLMs. Also worth pointing out is the fact that a year prior to the presented work, Yi-Coder-9B – the first open-source code LLM – achieved more than 50% accuracy on CRUXEval-O. This would explain its good pass rates in different coding exercises.

Benchmark results on HumanEval, MBPP and CRUXEval-O
source - https://01-ai.github.io/blog.html?post=en/2024-09-05-A-Small-but-Mighty-LLM-for-Code.md 

Yi-Coder-9B-Chat also performed well on code editing (CodeEditorBench), code completions (CrossCodeEval) and long-context modeling with up to 128K tokens. In math reasoning it got seventy percent. 3% of program-assisted mathematical reasoning proficiency according to program-aided benchmarking. It outperformed larger models. The use of the model in achieving these results has demonstrated its viability in terms of versatility and effectiveness. That makes Yi-Coder-9B-Chat an efficient tool to facilitate software development. It is remarkable though it still remains well under 10 billion parameters.

Comparative Analysis with Competing Models

Some differences stand out between Yi-Coder-9B-Chat, DeepSeek-Coder-33B-Instruct, and CodeLLama-34B-Instruct. Yi-Coder-9B-Chat is based on an optimized transformer model with 9 billion parameters. It can handle a maximum context length of 128K tokens. It is trained on 2.4 trillion high-quality tokens from GitHub and CommonCrawl. This makes it very efficient when working with large code repositories and various programming languages.

DeepSeek-Coder-33B-Instruct, in contrast, uses a transformer architecture with 33 billion parameters. It is trained on 2 billion tokens of instruction data. It provides project-level code completion and infilling with a window size of 16K. The model is trained on a dataset containing 2 trillion tokens, which are 87% code and 13% natural language. This makes it very flexible and highly scalable, easily adapting to any task requested by the user. CodeLLama-34B-Instruct, part of the Code Llama family, is a general-purpose code synthesis and understanding tool. It focuses on code completion and infilling. It ranges from 7 billion to 34 billion parameters and is designed for code synthesis and understanding tasks.

Yi-Coder-9B-Chat is an attractive option for developers seeking efficient and effective coding solutions. It is especially suitable for tasks requiring long-context understanding and support for multiple programming languages. In contrast, DeepSeek-Coder-33B-Instruct excels in project-level code completion and infilling. CodeLLama-34B-Instruct provides general code synthesis and understanding capabilities. Each model has its strengths, making them suitable for different use cases and scenarios.

How to Access and Use Yi-Coder-9B-Chat?

The code of Yi-Coder-9B-Chat can be accessed through the Hugging Face repository as well as the GitHub repository. Importantly, it is rather simple to run the model locally using transformers. In any case, it is possible to find it via online services. About the equipment, the two platforms offer elaborate instructions with regard to set up as well as general use. The model is free to use, and the developers have used Apache 2.0 license. Interested users can find all relevant links for this model at the end of this article.

Limitations and Future Work

Yi-Coder-9B-Chat is impressive. It may struggle with large and complex project or certain narrowly defined tasks. Some of them could be made to make it more flexible in the future. Perhaps, the formalization of its training data could help it handle more scenarios.

Conclusion

In general, Yi-Coder-9B-Chat is very useful in coding assignments. Thus, using this model in code development tasks offers its functionalities and performance as benefits to developers. It also comprehends long contexts and can accommodate multiple programming languages. This makes it a great asset to AI and coding and other technology-related communities.


Source
01.AI Blog: https://01-ai.github.io/blog.html?post=en/2024-09-05-A-Small-but-Mighty-LLM-for-Code.md 
Hugging Face Weights: https://huggingface.co/01-ai/Yi-Coder-9B-Chat
Research paper: https://arxiv.org/abs/2403.04652
research document: https://arxiv.org/pdf/2403.04652
Base model: https://github.com/01-ai/Yi-Coder


Disclaimer - This article is intended purely for informational purposes. It is not sponsored or endorsed by any company or organization, nor does it serve as an advertisement or promotion for any product or service. All information presented is based on publicly available resources and is subject to change. Readers are encouraged to conduct their own research and due diligence.

No comments:

Post a Comment

DeepSeek-V3: Efficient and Scalable AI with Mixture-of-Experts

Introduction Scalable and efficient AI models are among the focal topics of the current artificial intelligence agenda.  The purpose is to d...