Pages

Friday 23 June 2023

MPT-30B: The Open-Source LLM that Outperforms GPT-3 and Other LLMs in Text and Code Generation

MPT-30B: The New LLM Champion - symbolic image
Introduction

MosaicML is an artificial intelligence (AI) startup based in San Francisco that aims to make advanced AI more accessible and affordable for enterprises. The company offers a platform that allows businesses to train, customize and deploy NLP models on their own data using MosaicML’s model architectures and tools.

One of the key offerings of MosaicML is the Foundation Series, a collection of open-source LLMs that are trained on a large and diverse corpus of text data. The Foundation Series includes MPT-7B, which was launched in May 2023 and has been downloaded over 3 million times, and its variants MPT-7B-Instruct, MPT-7B-Chat and MPT-7B-StoryWriter, which are fine-tuned for specific tasks such as instruction following, conversation and story generation.

In June 2023, MosaicML announced the release of a new and more powerful member of the Foundation Series that has 30 billion parameters and an 8k token context window. This new model is known as 'MPT-30B'.

What is MPT-30B?

MPT-30B is a large language model (LLM) that can generate natural language texts on various topics and domains. It is based on the transformer architecture, which is a neural network that uses attention mechanisms to learn the relationships between words and sentences.

MPT-30B was pretrained on a large and diverse corpus of text data that includes Wikipedia, Common Crawl, books, news articles, web pages, social media posts, code snippets and more. The pretraining data mixture was designed to cover a wide range of domains and genres, such as natural language, programming languages, mathematics, science fiction, fantasy, horror and more.

Key Features of MPT-30B

MPT-30B has several features that make it stand out from other LLMs. Some of these features are as follows:

MPT-30B exceeds the quality of GPT-3
source - https://www.mosaicml.com/blog/mpt-30b

  • Quality: MPT-30B exceeds the quality of GPT-3 and is competitive with other open-source LLMs. MosaicML evaluated MPT-30B on several benchmarks and tasks and found that it outperforms GPT-3 on most of them and is on par with or slightly behind LLaMa-30B and Falcon-40B. MPT-30B also has strong coding abilities thanks to its pretraining data.

  • Size: MPT-30B has 30 billion parameters, which is larger than MPT-7B and comparable to other open-source LLMs. However, unlike Falcon-40B, which needs 2+ GPUs to run, MPT-30B can be deployed on a single GPU. This makes it more accessible and affordable for enterprises.

  • Context Length: MPT-30B has an 8k token context window at training time and supports up to 16k tokens at inference time via ALiBi. This means that MPT-30B can handle longer texts and generate more detailed outputs than other LLMs that have shorter context windows.

  • Efficiency: MPT-30B achieves high efficiency and utilization of GPU compute by using FlashAttention, which reduces the memory and computation cost of attention layers. This allows MPT-30B to fit more tokens per GPU and use less memory bandwidth. FlashAttention also improves the inference speed and throughput of MPT-30B.

Capabilities/Use Cases of MPT-30B

MPT-30B is a general-purpose LLM that can generate natural language texts on various topics and domains. It can be used for a variety of applications and tasks, such as:

  • Text Generation: MPT-30B can generate texts on any topic given a prompt or a query. 
  • Text Completion: MPT-30B can complete texts given a partial input or a context. 
  • Text Understanding: MPT-30B can understand texts and answer questions about them. 
  • Text Transformation: MPT-30B can transform texts from one form to another. 
  • Text Interaction: MPT-30B can interact with humans or other agents using natural language. 

To demonstrate some of the capabilities and use cases of MPT-30B, MosaicML has released two fine-tuned variants of MPT-30B: MPT-30B-Instruct and MPT-30B-Chat.

MPT-30B-Instruct is a variant of MPT-30B that is fine-tuned for single-turn instruction following. It can perform various tasks given natural language instructions or queries.

MPT-30B-Chat is a variant of MPT-30B that is fine-tuned for multi-turn conversations. It can engage in dialogues on various topics and domains with humans or other agents.

How does MPT-30B work?

MPT-30B is a decoder-style model based on a modified transformer architecture. A transformer is a neural network that uses attention to learn word and sentence relationships. A decoder-style model only uses the decoder part of the transformer to predict the next word given the previous words. This is good for text generation and completion, but not for text understanding and transformation.

To handle longer and more complex inputs, MPT-30B uses two techniques: ALiBi and FlashAttention.

ALiBi is a mechanism that lets the model access previous tokens beyond the context window by using a cache of embeddings. An embedding is a vector that represents a word or a token. ALiBi adds biases to the embeddings based on their recency and frequency.

FlashAttention is a mechanism that reduces the memory and computation cost of attention layers. An attention layer computes how much each token should pay attention to other tokens. FlashAttention uses hashing to group similar tokens into buckets and compute attention within each bucket.

By using ALiBi and FlashAttention, MPT-30B can handle up to 16k tokens at inference time, which is four times as long as GPT-3’s context window. This lets MPT-30B process and generate longer and more detailed texts than other LLMs.

Performance Evaluation with other Models

One of the remarkable abilities of MPT-30B models is their programming skills. Thanks to their pretraining data mixture that includes a lot of code snippets from various programming languages, MPT-30B models can generate and complete code in Python and other languages. To evaluate their coding performance, team compared MPT-30B models with other open source models, both general purpose and GPT-distilled, on HumanEval, a corpus of Python coding problems. The results are shown in Table below.

MPT-30B: Performance Evaluation with other Models

source - https://www.mosaicml.com/blog/mpt-30b

As we can see, MPT-30B models outperform LLaMa-30B and Falcon-40B by a wide margin, and even outperform many purpose-built coding models such as StarCoder. The best performing model among MPT-30B variants is MPT-30B-Chat, which is surprising since it was trained as a general conversational model. It scores 37.2% on HumanEval, which places it above almost all open-source models other than WizardCoder.

How to access and use MPT-30B?

MPT-30B is available on HuggingFace and MosaicML’s platform. You can download the model, explore the documentation, try the demos and interact with the community. You can also customize and deploy the model using MosaicML’s tools and services.

To use MPT-30B for text generation, completion or understanding, you can give a prompt or a query to the model and get an output. For example, you can ask the model to write a poem, complete a code or answer a question.

To use MPT-30B for text transformation or interaction, you may need to give some extra information or guidance to the model to help it do the task. For example, you can give keywords or examples to specify the style, tone or genre of the output text, or give instructions or commands to direct the model.

To customize MPT-30B for your domain or application, you can use MosaicML’s platform to finetune, pretrain or train the model using your data. You own the model weights, and your data is not stored on their platform. You can also deploy custom MPT-30B models on MosaicML compute or your own VPC using their inference stack.

If you are interested to learn more about 
MPT-30B model, all relevant links are provided under 'source' section at the end of this article.

Limitations

MPT-30B is a powerful LLM that can generate texts on various topics and domains. However, it has some limitations that users should be aware of. Some of these limitations are:

  • Accuracy: MPT-30B may generate wrong, incomplete or misleading texts. Users should check the outputs with reliable sources and use their own judgment.
  • Bias: MPT-30B may reflect the biases and prejudices of its pretraining data, which may be harmful or offensive. Users should be careful and ethical when using the model and its outputs.
  • Safety: MPT-30B may generate texts that are harmful, illegal or inappropriate. Users should be respectful and responsible when using the model and its outputs and follow the laws and regulations.

Conclusion

MPT-30B is a groundbreaking LLM that raises the bar for open-source foundation models. Excited to see what the community and customers will build with MPT-30B and how it will advance the field of artificial intelligence in enterprise applications.


source
mpt-30b Model: https://huggingface.co/mosaicml/mpt-30b
mpt-30b chatbot demo: https://huggingface.co/spaces/mosaicml/mpt-30b-chat
Blog post - https://www.mosaicml.com/blog/mpt-30b

No comments:

Post a Comment

Aria: Leading the Way in Multimodal AI with Expert Integration

Introduction Multimodal Mixture-of-Experts models are the latest in wave AI. They take in multiple kinds of input into a single system-inclu...