Pages

Sunday 11 June 2023

Orca: A 13-Billion Parameter Model that Outperforms Other LLMs by Learning from GPT-4

Orca AI Model - symbolic Image
Introduction
 
Artificial intelligence (AI) is constantly evolving and improving, thanks to the efforts of researchers and developers who are pushing the boundaries of what machines can do. One of the most challenging and exciting domains of AI is natural language generation (NLG), which is the ability to produce coherent and meaningful text from data or prompts. NLG has many applications, such as chatbots, content creation, summarization, translation, and more.

However, NLG is also a very complex and difficult task, requiring a lot of computational resources and data. To address this challenge, researchers have developed large foundation models (LFMs), such as GPT-4 and PaLM-2, which are massive neural networks that can generate text for a wide range of domains and tasks. These models have billions or trillions of parameters, which are the numerical values that determine how the model processes the input and produces the output.

However, LFMs are not perfect. They are often expensive to train and run, prone to errors and biases, and limited by the quality and quantity of the data they are trained on. Moreover, they are not easily accessible or customizable for specific needs or scenarios. Therefore, researchers have also explored ways to fine-tune smaller models from outputs of larger models, creating more efficient and specialized language models (LLMs) that can imitate the performance of LFMs.

One of the most recent and remarkable examples of this approach is a new AI model developed by Microsoft Research in collaboration with University of Washington. This new model is a 13-billion parameter model that belongs to the LLaMA model family, which stands for Large Language Models with Meta-learning Approach. It is designed to learn from rich signals from GPT-4, including explanation traces, step-by-step thought processes, and other complex instructions, guided by teacher assistance from ChatGPT. This new model is called 'Orca'.

What is Orca model?

Orca is a progressive learning model that imitates the reasoning process of GPT-4, a highly advanced language model developed by OpenAI. Orca learns from rich signals from GPT-4, including explanation traces and step-by-step thought processes. It also benefits from teacher assistance from ChatGPT, another language model that specializes in conversational tasks.

Orca’s development was driven by the need to address several challenges in the field of AI. These include limited imitation signals from shallow LFM outputs, small scale homogeneous training data, and a lack of rigorous evaluation that often results in overestimating the small model’s capability.

Orca addresses these issues by learning from complex explanation traces of GPT-4 using a novel technique called explanation tuning. Explanation tuning is a meta-learning algorithm that adapts to different tasks and domains by learning from diverse system instructions that guide the reasoning process. These instructions include chain-of-thought, explain like I’m five, be helpful and informative, etc.

Key Features of Orca model


source - https://arxiv.org/pdf/2306.02707.pdf

Orca has several key features that make it stand out among other LLMs. By referring graphs in above figure, some of the features are mentioned below:

  • It retains 95% of ChatGPT quality and 85% of GPT-4 quality aggregated across all datasets as assessed by GPT-4. This is a 10-point improvement over Vicuna, another language model in the LLaMA family.

  • It exhibits strong performance for prompts that span across a wide range of generation roles. For the Awesome prompts dataset that spans 164 open-ended generation roles, Orca shows strong performance by retaining 98% of ChatGPT quality and 89% of GPT-4 quality.

  • It demonstrates high-quality responses across a wide range of prompts. It has been trained on data that simulate zero-shot setting with standard prompts, which means it can generate responses without having seen similar prompts during training.

  • It has been trained with diverse system instructions to elicit different kinds of responses, which adds to its versatility and adaptability.

  • It uses a Transformer-XL architecture with 48 layers and 2560 hidden units, and a mixture of experts (MoE) technique to improve its efficiency and scalability.

Use Cases of Orca model

Orca is a versatile and powerful model that can generate high-quality and diverse content for various domains and tasks. Some of the use cases of Orca are content creation, such as blogs, articles; chatbots, such as natural and informative conversations with users; education, such as explanations; entertainment, such as stories; code generation, such as code snippets; math generation, such as math expressions; and diagram generation, such as charts, and graphs.

Architecture of Orca

Orca’s architecture consists of three main components: a student model, a teacher model, and a data generator. The student model is the Orca model itself, which is a 13-billion parameter Transformer-based neural network. The teacher model is GPT-4, which is a 175-billion parameter language model that can generate text for any prompt. The data generator is ChatGPT, which is a 2.7-billion parameter language model that specializes in conversational tasks.

The training process of Orca involves the following steps:

  1. The data generator produces a large and diverse set of prompts and responses, based on various sources of data, such as Flan-2, NIV2, Chain-of-Thought, T0, and Dialog.

  2. The teacher model generates complex explanation traces for each prompt-response pair, using different system instructions that guide the reasoning process, such as chain-of-thought, explain like I’m five, be helpful and informative, etc.

  3. The student model learns from the explanation traces of the teacher model, using the explanation tuning algorithm that adapts to different tasks and domains. The student model is evaluated on various datasets and metrics, such as human ratings, BLEU scores, ROUGE scores, etc.

Performance evaluation with other Models

Orca’s main competitors are other LLMs that are fine-tuned from LFMs, such as Vicuna, Alpaca, Dolly, etc. These models have similar goals and methods as Orca, but differ in their size, data sources, imitation signals, and evaluation methods.

Orca outperforms these models in terms of quality and diversity of responses across various datasets and tasks. For example, Orca achieves a higher BLEU score than Vicuna on the Awesome prompts dataset (0.42 vs 0.37), a higher ROUGE-L score than Alpaca on the Flan-2 dataset (0.69 vs 0.64), and a higher ROUGE-L score than Vicuna on the NIV2 dataset (0.76 vs 0.71).

Orca also compares favorably with its teacher models, ChatGPT and GPT-4, retaining most of their quality while being much smaller and more efficient. For example, Orca achieves a human rating of 4.1 out of 5 on the Awesome prompts dataset, compared to 4.3 for ChatGPT and 4.6 for GPT-4. It also achieves comparable performance with GPT-4 on the Chain-of-Thought dataset (0.83 vs 0.84 for GPT-4).

Orca performance with other models

source - https://arxiv.org/pdf/2306.02707.pdf

Above Figure shows the performance of Orca and other LLMs on Big-Bench Hard (BBH), a subset of the Big-Bench dataset, which is a large-scale benchmark that measures the abilities of AI models across a broad range of tasks and domains. Orca surpasses conventional state-of-the-art instruction-tuned models such as Vicuna-13B on this benchmark. Orca also reaches parity with ChatGPT on this benchmark and shows comparable performance with GPT-4. This shows that Orca can generate high-quality responses in a zero-shot setting without any exemplar or CoT.

How to access and use this model?

Orca is currently not publicly available or open source. However, Microsoft Research has published a paper describing the details and results of Orca’s development and evaluation.

you can use Orca-mini, a smaller version of Orca that is open-sourced and available on Hugging Face. Orca-mini is a 13-billion parameter model that is trained on a dataset that follows the same instructions as the original Orca dataset. You can run Orca-mini locally using Jupyter Notebook or the Obabooga Text generation WebUI. 

If you are interested to learn more about this model, you can find all links under 'source' section at end of this article. It includes article link for successor model.

Limitations

Orca is not without limitations or challenges. Some of them are:

  • It still suffers from some errors and biases that are inherited from its teacher models or data sources.
  • It still requires a lot of computational resources and data to train and run.
  • It still lacks some generalization abilities or domain adaptation skills that are needed for some tasks or scenarios.
  • It still faces some ethical or social issues that are associated with AI models in general.

Conclusion

Orca is a new AI model that represents a breakthrough in natural language generation. It learns from rich signals from GPT-4, including explanation traces and step-by-step thought processes, guided by teacher assistance from ChatGPT. Orca has many potential capabilities and use cases in various domains and tasks. Orca is currently not publicly available or open source, but Microsoft Research has published a paper describing its details and results. Orca is a new species rising in the AI ocean, demonstrating an unprecedented level of sophistication and capability.


source
https://arxiv.org/abs/2306.02707
https://arxiv.org/pdf/2306.02707.pdf
https://www.microsoft.com/en-us/research/publication/orca-progressive-learning-from-complex-explanation-traces-of-gpt-4/
Orca-mini: https://huggingface.co/psmathur/orca_mini_13b
https://huggingface.co/TheBloke/orca_mini_13B-GPTQ

successor model - 
https://socialviews81.blogspot.com/2023/07/openorca-preview1-13b-cost-effective.html

 


No comments:

Post a Comment

Aria: Leading the Way in Multimodal AI with Expert Integration

Introduction Multimodal Mixture-of-Experts models are the latest in wave AI. They take in multiple kinds of input into a single system-inclu...