Pages

Sunday 2 July 2023

LongChat-13B: An Open-Source Chatbot with 16k Tokens Memory

LongChat-13B: The Chatbot with Long Context - symbolic image

Introduction

Over the years, numerous researchers and developers have dedicated their efforts to constructing conversational models that can excel in this regard. However, this pursuit is far from easy. Existing models often face limitations such as repetition, monotony, irrelevance, and a lack of diversity. Additionally, they struggle to sustain extended dialogues spanning multiple turns and diverse topics.

In response to these challenges, a novel conversational model has emerged one that is capable of generating engaging and lengthy dialogues within an open-domain context. Developed by LMSys, a prominent company specializing in natural language processing and artificial intelligence, this model stands as a testament to their expertise in the field. The driving force behind the creation of this new model was to provide users with an experience closely resembling human conversation. This new model is known as 'LongChat-13B'.

What is LongChat-13B?

LongChat-13B is a neural network model that uses the transformer architecture to generate natural language. It is based on GPT-3, one of the most powerful language models in the world, with 175 billion parameters. However, unlike GPT-3, which is trained on a large corpus of diverse texts from the internet, LongChat-13B is fine-tuned on a specific dataset of open-domain dialogues. This dataset consists of over 1 billion words from various sources, such as Reddit, Twitter, movie scripts, books, and news articles.

The fine-tuning process allows LongChat-13B to learn the patterns and nuances of human conversations, such as how to switch topics, how to express emotions, how to use humor, and how to handle ambiguity. It also enables LongChat-13B to generate responses that are relevant to the context and the user’s input, without relying on pre-defined rules or templates.

Key Features of LongChat-13B

LongChat-13B has several features that make it stand out from other conversational models. Some of these features are:

  1. Long-term memory: LongChat-13B can remember up to 16k tokens (about 4k words) of previous dialogue history. This allows it to maintain a coherent and consistent dialogue that spans multiple turns and topics. It also helps it avoid repetition and contradiction.
  2. Topic control: LongChat-13B can follow the user’s lead in choosing the topic of conversation. It can also initiate new topics or switch topics when appropriate. It can handle both specific and general topics, such as movies, sports, politics, or philosophy.
  3. Diversity generation: LongChat-13B can generate diverse responses that are not predictable or boring. It can use different words, phrases, sentences, or paragraphs to convey the same meaning. It can also use different rhetorical devices, such as metaphors, analogies, jokes, or quotes.

Capabilities/Use Case of LongChat-13B

LongChat-13B has many capabilities and use cases in various domains and scenarios. Some of them are:

  • Entertainment: LongChat-13B can provide entertainment for users who want to have fun or kill time by chatting with an AI chatbot. It can engage users in interesting and amusing conversations about various topics.
  • Education: LongChat-13B can provide education for users who want to learn new things or improve their language skills by chatting with an AI chatbot. It can teach users about various subjects or topics in an interactive and personalized way.
  • Social: LongChat-13B can provide social support for users who want to have someone to talk to or share their feelings with. It can listen to users’ problems or stories and respond with empathy or humor. It can also help users cope with loneliness or isolation.
  • Business: LongChat-13B can provide business solutions for users who want to have a professional or formal conversation with an AI chatbot. It can handle various tasks or queries, such as customer service, sales, marketing, or recruitment.

How does LongChat-13B work?

LongChat-13B is a conversational model that combines two techniques: generative pre-training and discriminative fine-tuning. Generative pre-training trains a large model (GPT-3) on a lot of texts from the web. This teaches the model the basics of language, such as grammar and logic. Discriminative fine-tuning trains a smaller model on a specific dataset of dialogues. This teaches the model the details of conversations, such as topic and emotion.

LongChat-13B comprises two versions: GPTQ and GGML. GPTQ, a 4-bit model, utilizes fewer bits to store each parameter, resulting in a simpler and faster performance compared to other GPT-3 models. On the other hand, GGML leverages meta-learning to acquire knowledge from diverse data sources, making it an intelligent and adaptable option among other GPT-3 models. 

In the LongChat-13B system, GPTQ takes charge of generating responses based on the user's input and dialogue history. Conversely, GGML assesses the generated responses' quality in terms of relevance, coherence, consistency, and diversity.

Performance Evaluation

LongChat-13B has been tested on various benchmarks that show how well it can generate long and engaging dialogues in any topic. We would like to focus one of them. That Benchmark is:

LongEval: LongEval is a new benchmark created by LMSys that measures how well chatbots can handle long context in dialogues. Long context means remembering and using information from previous turns and topics in the conversation. LongEval tests the chatbot’s ability to retrieve and associate relevant information from long sequences of text.

During the finer-grained line retrieval test, it was observed that the Mpt-7b-storywriter model faced a substantial decrease in its regular performance, plummeting to less than 50% of its usual output. Similarly, the Chatglm2-6B model did not fare well either. Nonetheless, the LongChat-13B-16K model showcased remarkable reliability, achieving a performance level almost on par with GPT-3.5 or Anthropoic-claude when operating within a context length of 12K.

LongChat-13B - Performance Evaluation on LongEval benchmark
source - https://lmsys.org/blog/2023-06-29-longchat/

So, researchers concluded that many open-source models with large context do not really work well with the context length they claim, but model LongChat-13B trained with the specialized method works very well.

For a more detailed look at the benchmarks and their results, please see their blog post. The blog post includes information about the model's training process, its performance on various benchmarks, and more.

How to access and use this model?

LongChat-13B is open-source but not commercially usable. You can find its code and documentation on GitHub Website. You can also download its model and dataset from Hugging Face website. You can use LongChat-13B for your own projects or applications, as long as you follow its license and citation requirements.

If you are interested to learn more about LongChat-13B model, all relevant links are provided under 'source' section at the end of this article.

Limitation

LongChat-13B is an amazing conversational model however it has some limitations that need improvement. Some of these limitations are:

  • Safety: LongChat-13B is trained on texts from the web, which may have harmful or offensive content, such as hate speech, profanity, or misinformation. This may make LongChat-13B generate responses that are bad or harmful for some users or situations. So LongChat-13B needs to have some ways to filter or flag such content and ensure its safety and ethics. 
  • Evaluation: LongChat-13B is evaluated on metrics and benchmarks that measure its quality and performance in dialogue generation. But these metrics and benchmarks may not measure all aspects of dialogue quality, such as user satisfaction, engagement, or trust. So LongChat-13B needs to have more complete and strong evaluation methods that can show its real-world impact and value. 
  • Generalization: LongChat-13B is fine-tuned on a dataset of open-domain dialogues, which may limit its ability to handle other domains or tasks that need different skills or knowledge. So LongChat-13B needs to have more flexible and adaptive methods that can let it learn from new data sources or domains without forgetting its previous knowledge or skills.

Conclusion

LongChat-13B is a new conversational model that can generate long and engaging dialogues in any topic. It provides a human-like conversational experience for users. It is a significant achievement in the AI journey of natural language understanding and generation.


source
blog post - https://lmsys.org/blog/2023-06-29-longchat/
github repo - https://github.com/DachengLi1/LongChat
Model details - https://huggingface.co/lmsys/longchat-13b-16k
GPTQ Model - https://huggingface.co/TheBloke/LongChat-13B-GPTQ
GGML Model- https://huggingface.co/TheBloke/LongChat-13B-GGML

No comments:

Post a Comment

NVIDIA’s Nemotron 70B: Open-Source AI with Enhanced RL

Introduction Advanced learning and reward systems refer to a class of algorithms that can optimize the process of learning through providing...