Pages

Thursday, 11 July 2024

Google’s Gemma 2: Redefining Performance in Lightweight AI Models

Presentational View

Introduction 

Artificial Intelligence (AI) has experienced a tremendous breakthrough primarily in the context of lightweight SOTA AI models. These models are aimed at delivering top performance while being efficient with respect to computation, for broad reach among the user base. However, it is a road not without its hitches. Some of the most important problems that face researchers and developers in this area include balancing performance with efficiency, ensuring safety, and making these models available to the widest audience. 

In this context, Gemma 2, developed by Google DeepMind, emerges as a beacon of progress. The primary motivation behind the development of Gemma 2 was to create a high-performing, efficient, and accessible AI model that could be used by researchers and developers globally. This model aims to address the challenges faced in the development of AI models by offering improvements over previous models with similar SOTA capabilities.

Gemma 2 is the latest member of the Gemma family of models, introduced as lightweight and advanced open models. The whole Gemma line, including CodeGemma, RecurrentGemma, and PaliGemma, was designed to provide customized capabilities for different AI tasks. Every model in the Gemma family has a unique approach and contribution towards the realization of further AI.

What is Gemma 2?

Gemma 2 is a state-of-the-art open model developed by Google. It is part of a family of lightweight models that are built on the same research and technology used to create the Gemini models. Gemma 2 is designed to deliver best-in-class performance and efficiency, making it a practical choice for various AI applications. It offers state-of-the-art performance in a more accessible package, making it Google’s newest open-source large language model.

Model Variants 

Gemma 2 comes in two sizes, each tailored for separate purposes: The model has 9 billion parameters and pushes the classification state-of-the-art way beyond models like Llama 3 8B. Meanwhile, the Gemma 2 27B has triple the number of parameters, being a larger 27 billion model, and matches performance with models more than twice its size. Each size comes in two forms: Base models, pre-trained on a wide swathe of text data, and Instruction-tuned (IT) models fine-tuned to better performance on specific tasks.

Key Features of Gemma 2

Increased Training Data: Gemma 2 models have been trained on substantially more data, leading to improved performance.

  • Sliding Window Attention: Gemma 2 implements a novel approach to attention mechanisms, enhancing its ability to understand and generate text.
  • Soft-Capping: To improve training stability and performance, Gemma 2 introduces a soft-capping mechanism.
  • Outsized Performance: At 27B, Gemma 2 delivers the best performance for its size class, offering competitive alternatives to much larger models.
  • Efficiency and Cost Savings: Gemma 2 is designed to run efficiently on a single Google Cloud TPU host or NVIDIA GPUs, significantly reducing deployment costs.
  • Blazing Fast Inference: Gemma 2 is optimized to run at incredible speed across various hardware setups, from gaming laptops to cloud-based environments.

Capabilities of Gemma 2 

  • Performance: Gemma 2 is developed to provide performance compared to much larger proprietary models but in a package designed for broader accessibility and use on more modest hardware setups. 
  • Text-based Tasks: Gemma 2 has an excellent performance on a wide range of text-based tasks. 
  • Integration with AI Tools: Gemma 2 easily integrates with platforms such as Hugging Face, NVIDIA, and Ollama, making it possible to be used with different AI tasks. 
  • Hardware Compatibility: Gemma 2 is well suited for a number of different hardware setups; as such, it is accessible without big investments in hardware.

How does Gemma 2 work?/ Architecture/Design

Gemma 2, part of a family of models with 2.6B, 9B, and 27B parameters, is built on a decoder-only transformer architecture. It uses a combination of local sliding window attention and global attention, alternating between these in every other layer. The local attention spans 4096 tokens, while the global attention spans 8192 tokens.

The architecture incorporates Rotary Position Embeddings (RoPE) for positional information and approximated GeGLU as the non-linearity function. It also employs Grouped-Query Attention (GQA) for improved inference speed and RMSNorm for normalization. A unique feature is the logit soft-capping mechanism, which constrains logit values in attention layers and the final layer.

The training process involves knowledge distillation for smaller models (2.6B and 9B), learning from a larger teacher model. The training data mixture is carefully curated, and post-training, the models undergo supervised fine-tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF) to create instruction-tuned versions. The final models are optimized across a wide range of tasks.

Performance Evaluation with other Models 

The competitive results from the evaluation of the Gemma-2 model family are, to say the least, impressive in scope and show a major improvement from its predecessors as well as against larger models. One of the most interesting evaluations is the LMSYS Chatbot Arena, shown in figure below. In this blind side-by-side human-rater evaluation, the Gemma-2 27B Instruction Tuned model is slightly better than models with much bigger parameters, like the Llama3-70B-Instruct and Nemotron-4-340B, thus pushing the envelope of setting a new state-of-the-art for open-weights models. 

Evaluation of Gemma 2 9B and 27B Instruction Tuned models on the Chatbot Arena
source - https://storage.googleapis.com/deepmind-media/gemma/gemma-2-report.pdf

Besides, the Gemma-2 9B is showing as the best model within the same order of magnitude of parameters. Another important benchmark is a thorough set of standard benchmarks presented in Table 13. The test compares Gemma-2 models, ranging from 2.6B to 9B and 27B, against other models, such as Mistral 7B and LLaMA-3 8B, on a number of tasks, such as question answering, reasoning, and coding. In all such cases, the results show that the Gemma-2 models outperform the remaining models compared to their previous and comparable models, where the 9B and 27B models have shown very strong results. On the MMLU benchmark, for instance, 9B reaches 71.3%, while 27B goes further and scores 75.2%, beating many significantly larger models. 

Comparison of models in the different parameters range on a variety of benchmarks.
source - https://storage.googleapis.com/deepmind-media/gemma/gemma-2-report.pdf

Besides these main testing benchmarks, the Gemma-2 models are subjected to intensive testing in a variety of domains. These include human-preference ratings for safety and instruction following, multi-turn conversation abilities, and specific benchmarks for areas such as cybersecurity, code vulnerability detection, and persuasion tasks. In addition, the models showed improvement in less training tasks after fine-tuning of instruction. The researchers also carried out strong safety evaluations, including tests for toxic content generation and bias, to ensure that the improved capabilities of these models are balanced with responsible AI principles.

Comparative Analysis: Gemma-2, Mistral 7B, and Llama 3 

Each of the three models—Gemma-2, Mistral 7B, and Llama 3 offers a unique strength. Gemma-2 comes with standard and instruction-tuned variants, exploiting a teacher-student model paradigm that allows it to learn from bigger, more complicated models to inform training of its compact counterparts. It makes it different from the rest. 

However, Mistral 7B is a generative text model, which is pre-trained with 7 billion parameters. It uses Grouped-query Attention (GQA) and Sliding Window Attention (SWA) to have an effective working source that provides exactly correct results. The Mistral 7B can get tuned with the help of some techniques called Parameter-Efficient Fine-Tuning or the PEFT in the same way as LORA. Auto-regressive language model, Llama 3, optimized for transformer architecture,. It comes with 8B and 70B parameters, uses a context length of 8K tokens, and applies Grouped-Query Attention, which reduces memory bandwidth and increases efficiency. 

While each model has its strengths, Gemma-2 stands out for its unique teacher-student paradigm of models. The approach allows Gemma-2 to condense the knowledge contained in more massive, more complex models into the guidance and foundation for training their more compact cousins. All these characteristics make Gemma-2 a unique solution for scenarios with a high demand for efficiency and performance, where this ability to leverage more knowledge inside larger models can bring the advantage. It would depend, however, on the specific uses and requirements between Gemma-2, Mistral 7B, and Llama 3.

How to Access and Use This Model?

Want to try out its full-scale 27B functionality without the hardware requirements? You can do this now in Google AI Studio with Gemma2. You can also download Gemma 2 model weights on Kaggle and Hugging Face models, with Vertex AI Model Garden support coming soon.

Areas of Improvements

Gemma 2 has been a huge step forward for AI, but there's more work to be done.

  • Factuality: The fact that needs more scrutiny and improvement is the model's accuracy. It is very important to verify the reliability and accuracy of information coming from a model for it to be effective.
  • Adversarial examples robustness: Insights about how we can ensure that the model is less susceptible to adversarials. This is necessary to make sure that the model should be reliable and accountable.
  • Reasoning powers: Research is needed in terms of how logical deductions can be made by the model As it turns out, these operations can help a lot and really push the model to generalization in certain tasks.
  • Safety testing: Although extensive tests have been done for safety, it may not be enough to cover every application and use case. Users are advised to carry out.

Conclusion 

The development of Gemma 2 demonstrating how AI can be both high-performing and efficient, while also being accessible to a wide user base. This model also underscores the transformative role of AI in business by offering cost savings and integration with various AI tools. However, the need for improvements in different areas highlights the ongoing efforts to bridge the technical-practical gap in AI.


Source
Google blog: https://blog.google/technology/developers/google-gemma-2/
Hugging Face Blog: https://huggingface.co/blog/gemma2
Technical Report: https://storage.googleapis.com/deepmind-media/gemma/gemma-2-report.pdf
Model variants : https://huggingface.co/collections/google/gemma-2-release-667d6600fd5220e7b967f315


Disclaimer - This article is intended purely for informational purposes. It is not sponsored or endorsed by any company or organization, nor does it serve as an advertisement or promotion for any product or service. All information presented is based on publicly available resources and is subject to change. Readers are encouraged to conduct their own research and due diligence.

No comments:

Post a Comment

DeepSeek-V3: Efficient and Scalable AI with Mixture-of-Experts

Introduction Scalable and efficient AI models are among the focal topics of the current artificial intelligence agenda.  The purpose is to d...