Pages

Monday, 12 February 2024

MiniCPM-2B: New Compact Multimodal LLM Outperforming the Giants


Introduction

In the realm of artificial intelligence, Large Language Models (LLMs) have emerged as a transformative force, demonstrating exceptional prowess in tasks involving natural language understanding, generation, and even multimodal applications. However, the majority of these powerful models are designed for cloud-side deployment, which can impose limitations on their accessibility, efficiency, and privacy.

Enter MiniCPM-2B, a groundbreaking end-side LLM developed jointly by OpenBMB and Tsinghua University’s Natural Language Processing Lab. This model is a part of the MiniCPM series, which aims to unlock the potential of end-side LLMs. Unlike their cloud-side counterparts, end-side LLMs operate on local devices such as laptops, desktops, and even mobile phones, eliminating the need for cloud servers or internet connections. This innovative approach offers numerous advantages, including lower latency, enhanced security, and an improved user experience.

What is MiniCPM-2B?

Diving deeper into the specifics, MiniCPM-2B is a transformer-based Large Language Model (LLM) with 2.4 billion non-embedding parameters. It’s trained on a diverse and extensive corpus of 1.6 TB, encompassing a wide range of domains and modalities, including text, image, audio, video, and code. The model employs a 32K subword vocabulary and a sequence length of 1024 tokens. Its architecture comprises 48 layers, 32 heads, and a hidden size of 2048.

MiniCPM-2B comes in two versions: MiniCPM-2B-SFT and MiniCPM-2B-DPO. The SFT version is fine-tuned with static prompts on various downstream tasks, such as Chinese, mathematics, coding, dialogue, and instruction. Static prompts are fixed input templates that guide the model to generate the desired output.

On the other hand, the DPO version is fine-tuned with dynamic prompt optimization (DPO) on the MTBench dataset, a benchmark that simulates real-world user scenarios of LLMs. DPO is an innovative technique that automatically learns the optimal prompts for different tasks and domains, eliminating the need for human intervention.

Key Features of MiniCPM-2B

MiniCPM-2B is not just another Large Language Model. It brings a unique set of features to the table that sets it apart from its peers. Its compact size and high performance are just the tip of the iceberg.

One of the standout features of MiniCPM-2B is its ability to run on local devices. This end-side deployment eliminates the need for cloud servers or internet connections, leading to reduced latency, bandwidth usage, and costs. More importantly, it enhances the security and privacy of users’ data and queries.

Another key feature is the Dynamic Prompt Optimization (DPO). This feature allows MiniCPM-2B to learn the optimal prompts for different tasks and domains autonomously, improving its adaptability, robustness, and generality. This also reduces the human effort and expertise required to use LLMs.

But that’s not all. MiniCPM-2B is a multimodal model, capable of handling not just text, but also image, audio, video, and code inputs and outputs. It can perform cross-modal tasks such as image captioning, text-to-speech, speech-to-text, and video summarization. A testament to its multimodal capabilities is the development of MiniCPM-V, a model based on MiniCPM-2B, which outperforms many existing multimodal models.

Capabilities/Use Cases of MiniCPM-2B

MiniCPM-2B has many capabilities and use cases that can benefit various users and applications. Here are some examples:

  • Natural language understanding and generation: MiniCPM-2B can understand and generate natural language in various forms and styles, such as text, speech, and poetry. It can also handle multiple languages, especially Chinese, which is often underrepresented in LLMs. It can perform tasks such as text summarization, sentiment analysis, machine translation, question answering, and text classification.


  • Mathematics and logic: MiniCPM-2B can solve mathematical problems and perform logical reasoning, such as arithmetic, algebra, geometry, calculus, and proof. It can also generate mathematical expressions and proofs, as well as explain the solutions and steps.

  • Coding and programming: MiniCPM-2B can write and execute code in various programming languages, such as Python, C++, and Java. It can also complete, debug, and optimize code, as well as generate comments and documentation. It can perform tasks such as code synthesis, code completion, code summarization, and code search.

  • Multimodal and cross-modal: MiniCPM-2B can handle not only text, but also image, audio, video, and code inputs and outputs. It can also perform cross-modal tasks, such as image captioning, text-to-speech, speech-to-text, and video summarization. It can perform tasks such as multimodal search, multimodal generation, multimodal analysis, and multimodal fusion.


Harnessing Effective Training Methods: The Experimentation Edge

The journey of MiniCPM-2B’s development is akin to a thrilling adventure, with the Model Wind Tunnel Experiment playing a pivotal role. This experiment is like a rigorous training ground, where smaller models are put through their paces to uncover the most effective training methods for their larger counterparts.

The experiment zeroes in on five key aspects:

  • Hyperparameters: Think of these as the initial settings of the model, the starting points that can make or break the performance.
  • Batch Size: This is the number of training examples used in one iteration. It’s a balancing act that can significantly impact the model’s speed and quality of training.
  • Learning Rate: This tuning parameter in the optimization algorithm is like the pace at which the model learns, determining the step size at each iteration while moving towards a minimum of a loss function.
  • Learning Rate Scheduler: This method adjusts the learning rate in response to the model’s performance or the number of epochs elapsed, acting as the reins that control the learning process.
  • Data Strategy: This involves strategies for data preprocessing, augmentation, and splitting, which can influence the model’s ability to learn, much like a diet can affect an athlete’s performance.

By fine-tuning these aspects, the developers were able to supercharge MiniCPM-2B, transforming it into a competitive model despite its smaller size. The insights gained from these experiments then informed the training process of the larger models, paving the way for their success.

Performance Evaluation with Other Models

In comprehensive benchmarks, MiniCPM-2B ranks closely with Mistral-7B, even surpassing models like Llama2-13B, MPT-30B, and Falcon-40B. It particularly excels in tasks involving Chinese language processing, mathematics, and coding abilities.

Overall Performance

When evaluated on the MTBench benchmark, which closely simulates user experience, MiniCPM-2B outperforms many representative open-source models. These include Llama2-70B-Chat, Vicuna-33B, Mistral-7B-Instruct-v0.1, and Zephyr-7B-alpha, highlighting the model’s robust performance across diverse tasks.

In a move that underscores their commitment to fostering research and innovation, the developers have decided to fully open-source the model parameters of MiniCPM-2B. This is intended for academic research and limited commercial use. In addition, all checkpoints and most non-proprietary data during the training process will be made available. This will provide researchers with valuable resources to study the mechanisms of the model and further advance the field of large language models.

When it comes to the specific performance Evaluation, MiniCPM-2B doesn’t shy away. In the league of large models, it not only matches strides with most models at the 7B-scale but even leaves some models with a scale of 10B or above in its wake.Switching gears to smaller models, MiniCPM-2B flexes its muscles and outperforms all available contenders across all test sets, with the exception of a few English evaluation datasets. It’s like a versatile athlete, excelling in almost every event!

How to Access and Use MiniCPM-2B?

Accessing and using MiniCPM-2B is a straightforward process, thanks to its public availability. Here’s how you can get started:

  • Hugging Face Model Hub: Download the MiniCPM-2B models directly from the Hugging Face model hub. You’ll find the MiniCPM-2B-SFT models for various tasks, such as Chinese, mathematics, coding, dialogue, and instruction, as well as the MiniCPM-2B-DPO model for the MTBench dataset. The MiniCPM-V model, a multimodal model based on MiniCPM-2B, is also available. Use the Hugging Face library to load and use the models in your Python code.
  • GitHub Repository: Clone the GitHub repository of MiniCPM-2B to access the source code, data, scripts, and instructions for training and using MiniCPM-2B. The repository also houses the pre-trained models, prompts, and outputs of MiniCPM-2B on various benchmarks and tasks. Feel free to contribute to the development and improvement of MiniCPM-2B by submitting issues and pull requests.

MiniCPM-2B is open-source and available for commercial use, licensed under the Apache License 2.01.

Limitations and Future Work

Despite its impressive capabilities, MiniCPM-2B does have certain limitations and areas for future improvement:

  • Influence of Prompts: The model’s output is significantly influenced by the prompts. This is a limitation inherent to the model’s size and can potentially lead to inconsistent results after multiple attempts.
  • Knowledge Recall Accuracy: The model’s capacity constraints limit the accuracy of its knowledge recall. This means that the model might not always retrieve the most accurate or relevant information.
  • Future Improvements: One of the key areas for future work is to enhance the model’s knowledge recall ability. This would involve refining the model’s ability to access and retrieve information, thereby improving its overall performance.

Conclusion

MiniCPM-2B is a promising development in the field of Large Language Models. Despite its compact size, it delivers high performance, surpassing several larger models. Its open-source nature and edge-side deployment make it highly accessible for various applications. However, like all models, it has its limitations and there is room for improvement in future iterations.

Source
Blog: https://shengdinghu.notion.site/MiniCPM-Unveiling-the-Potential-of-End-side-Large-Language-Models-d4d3a8c426424654a4e80e42a711cb20
Github Repo: https://github.com/OpenBMB/MiniCPM/blob/main/README-en.md
Models: https://huggingface.co/openbmb/MiniCPM-V

No comments:

Post a Comment

ShowUI: Advanced Open-Source Vision-Language-Action Model for GUI

Introduction Graphical User Interface (GUI) assistants assist users to interact with digital appliances and applications. They can be an ord...