Pages

Wednesday, 21 August 2024

EXAONE 3.0 : Surpassing Comparable Llama 3.1 and Gemma 2

Presentional View

Introduction

Small open models are marching to the field by utilizing large annotated data and community effort, considerably improve computational efficiency while ensuring outstanding performance. On the other hand, these models suffer from some limitations like high computational and maintenance costs in order to provide real-world services. To solve few of these challenges, EXAONE 3.0 provides a high-quality bilingual model with Real-world performance which can help the overall AI progression.

Leading AI Innovations Developer LG AI Research Introduces Enhanced EXAONE 3.0. LG AI Research are concentrating on integrating cutting-edge artificial intelligence to daily life, providing high-quality AI capabilities to everyone. EXAONE 3.0 has been envisioned as a universal yet affordable model for the research and application of AI.

What is EXAONE 3.0?

EXAONE stands for EXpert AI for EveryONE. EXAONE 3.0 is a bilingual generative model and has been pre-trained and finetuned to follow instructions. It contains in total of approximately 7.8 billion parameters. It is initialized with 8 trillion curated tokens as a pre-training, then fine-tuned using supervised fine tune and direct preference optimization. This model is a multitasking and efficient model.

Key Features of EXAONE 3.0

  • Bilingual Support: Supporting English and Korean languages allows EXAONE 3.0 use for broader user segment; This bilingual ability is crucial in doing this as it allows the model to work well for areas where both languages are common. Its architecture has been trained specifically for the linguistic characteristics of each language, in order to understand and generate text appropriately.
  • High Efficiency: Inference speed reduced with 56% of the processing time and memory usage decreased by 35%, operational costs to execute inference dropped down for circa 72% in respect to its predecessor, EXAONE at versioned environment. This efficiency makes it more accessible for both business and developers, as well increasing its scalability across different levels of applications whether big enterprise or just a small startup.
  • Advanced Training: It uses Rotary Position Embeddings (RoPE) which helps in keeping the historical context for long text and Grouped Query Attention (GQA) to perform better on tasks that need tedious reasoning.

Capabilities/Use Cases of EXAONE 3.0

  • Enterprise AI Agent: Being an Enterprise AI-Agent, EXAONE 3.0 exponentially increases the capability of transforming workplace productivity. It allows for web based Q&A and brings realtime response joy by answering users questions in seconds. Furthermore it supports document and image-based Q&A, which makes the solution suitable for different domains. It also helps to coding and database management, which streamlines workflows & saving time for tedious work.
  • Medical Sector: EXAONE 3 uses its immense health data processing power for predictive analytics meaning predicting patient outcomes and suggesting possible population level adverse events. Additionally, the model enables personalized medicine by processing unique patient data to suggest specific treatment options that benefit both individual patients and their care.
  • Competitive Performance: EXAONE 3.0 proved to be highly competitive in terms of benchmark performance compared with other open models on a similar scale. It not only gives competitions but also tops in benchmarks like MT-Bench, Arena-Hard or WildBench.

How Does EXAONE 3.0 Work? (Architecture/Design)

EXAONE 3.0 is based on a decoder-only transformer architecture, which has grown increasingly popular in recent years. Such an architecture ensures that each word or text to be processed sequentially and the model can generate text, excellent for natural language processing tasks. The EXAONE 3.0 7.8B variant, with a maximum context length of up to 4,096 tokens, allowing very long input sequences. Additionally, to improve its performance , the model is built with two important key innovations: Rotary Position Embeddings (RoPE) and Grouped Query Attention (GQA).

Rotary Position Embeddings (RoPE) is important to enable the model to perceive how tokens relations within a sequence are. RoPE is different from regular positional encodings in the sense that RoPE also uses a rotation matrix over key, query vectors for the attention mechanism. This will help to better capture positions of words, and is therefore performing substantially well in the tasks which require processing word order(positional information) more than a simple bag-of-words model. On the other hand, Grouped Query Attention (GQA) is an optimization technique that reduces computational complexity while preserving model accuracy. Since GQA groups queries, it can compute attention more efficiently (as a side effect it may also allow for better parallelization during training and inference).

The EXAONE 3.0 7.8B model contains a total of only 32 layers, with a model dimension at just over (4k) and feedforward dimention in the tens of kilobytes as well—14,336 to be precise. In the context of its GQA setup, 32 attention heads are used with 8 key-value heads. The model uses the activation function as SwiGLU and a pre-normalization for stability in training. EXAONE 3.0 has an especially firm capacity in handling language tasks with a vocabulary size of 102,400 tokens; it can understand English and Korean very well.

Performance Evaluation with other models

EXAONE 3.0 has shown the best competitive performance compared to other open source models of comparable size from state-of-the-art technology that have been tested and evaluated. The evaluation results reveal that the EXAONE 3.0, a 7.8B instruction-tuned model, performs better than any other equivalent model in different measured parameters.

Evaluation results of EXAONE 3.0 7.8B instruction-tuned model across 4 benchmarks
source - https://arxiv.org/pdf/2408.03541

One key area where EXAONE is particularly competitive is real-life use cases. For example, as illustrated in table above, it achieved the highest scores on benchmarks such as MT-Bench, Arena-Hard-v0.1, WildBench and AlpacaEval 2.0 LC while surpassing the performances of Llama 3.1, Gemma 2 and Mistral 7B among others. This highlights the powerful instruction-following ability of EXAONE and its capacity to tackle complicated real-life applications.

Evaluation results of EXAONE 3.0 7.8B instruction-tuned model on Various bechmarks
source - 
https://arxiv.org/pdf/2408.03541

Moreover, specialized domains also witnessed outstanding performance by the EXAONE 3.0 model; for instance it ranked first in MATH benchmark which tests how well a model can handle challenging competition-level mathematics questions (see first table above). Also performing excellently was this model on HumanEval benchmark for Python code generation as indicated in second table above surpassing other models used for comparison purposes.

The comprehensive evaluation results including general benchmarks performed by the model for reasoning tasks as well as Korean specific datasets show amazing outcomes that underscore how great EXAONE is at tackling most computational problems even in the most intricate languages.

Comparing EXAONE 3.0 (7.8B) with Other Leading AI Models

EXAONE 3.0 (7.8B) is great at handling both English and Korean, making it very useful in areas where these languages are common. This gives it an edge over models like Llama 3.1 (8B) and Gemma 2 (9B), which might not be as good with bilingual tasks.

On the other hand, Llama 3.1 (8B) is open-source, meaning anyone can customize it and use advanced AI features. But, because it’s a bit bigger, it might need more computing power, which can be a downside compared to the more efficient EXAONE 3.0 (7.8B).

Gemma 2 (9B) stands out with its unique attention mechanisms and performs well even on less powerful hardware. This makes it a good choice when you need to be efficient with your computing resources, which can be better than EXAONE 3.0 (7.8B) in some cases where saving resources is important.

Phi-3 (7B) is built for efficiency and can be used locally, supporting a large context window of 128K tokens and running well on regular GPUs and mobile devices. This makes it perfect for cost-effective and local solutions, though it might not be as good at bilingual tasks or as effective in real-world applications as EXAONE 3.0 (7.8B).

So, while EXAONE 3.0 (7.8B) is known for its strong ability to follow instructions, good performance on general tasks, and excellent bilingual skills, especially in Korean, it’s important to consider the strengths and weaknesses of the other models when choosing the best one for your needs.

How to Access and Use EXAONE 3.0?

EXAONE 3.0 is accessible via GitHub and Hugging Face. Locally, this is usable with the latest version of the transformers library. The model is open-source for non-commercial research purposes. Users who wants to learn more about this AI models, all important links associated with this models are provided at end of this article.

Limitations of EXAONE 3.0

  • Inappropriate Responses:  Can be offensive, such as making harmful or inappropriate content.
  • Bias: Risk of biased responses based on age, gender, race etc.
  • Accuracy: Relies on training data statistics, thus answers may be semantically or syntactically wrong.
  • Old information: Don't have the latest data so can give false or conflicting info.
  • Practicing Ethical Use: one must not conduct malicious activities in order to avoid unethical outputs.

LG AI Research is dedicated to mitigating these potential risks and enforcing ethical deployment practices.

Conclusion

EXAONE 3.0 shows that even smaller models can reach state-of-the-art performance thanks to new training tricks. This model comes as great tool for different applications due to its specific capabilities and high efficiency. But its limitations and licensing requirements show the work that remains to be done in this area.


Source
Research Paper : https://arxiv.org/abs/2408.03541
research document: https://arxiv.org/pdf/2408.03541
GitHub Repo: https://github.com/LG-AI-EXAONE/EXAONE-3.0
hugging face: https://huggingface.co/LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct


Disclaimer - This article is intended purely for informational purposes. It is not sponsored or endorsed by any company or organization, nor does it serve as an advertisement or promotion for any product or service. All information presented is based on publicly available resources and is subject to change. Readers are encouraged to conduct their own research and due diligence.

No comments:

Post a Comment

DeepSeek-V3: Efficient and Scalable AI with Mixture-of-Experts

Introduction Scalable and efficient AI models are among the focal topics of the current artificial intelligence agenda.  The purpose is to d...