WizardMath: A Novel AI Model that Solves Complex Math Problems

Introduction

Mathematics is a universal language that can express complex concepts and phenomena in a concise and precise way. However, most natural language processing (NLP) models struggle to understand and generate mathematical expressions, especially when they involve symbolic reasoning and manipulation. This is a major limitation for many NLP applications that require mathematical skills, such as question answering, education, science, and engineering.

To address this challenge, a team of researchers from Microsoft, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences have developed a novel model called WizardMath . WizardMath is a large pre-trained language model that can perform mathematical reasoning and generation tasks using a novel technique called Reinforced Evol-Instruct. The motto behind developing this model is to empower NLP models with mathematical abilities and enable them to solve complex problems that involve mathematics.

What is WizardMath?

WizardMath is a model that enhances the mathematical reasoning abilities of Llama-2, a large language model (LLM), by applying a proposed Reinforcement Learning from Evol-Instruct Feedback (RLEIF) method to the domain of math.

Key Features of WizardMath

WizardMath has several key features that make it unique and powerful among existing NLP models. Some of these features are:

It can enhance the mathematical reasoning abilities of Llama-2, which is another large language model that can handle natural language and mathematical expressions.
WizardMath uses a unique method called Reinforcement Learning from Evol-Instruct Feedback (RLEIF), which allows it to learn from its own mistakes and improve its performance by generating and evaluating multiple candidate solutions for a given problem.
WizardMath has been tested on two mathematical reasoning benchmarks, namely GSM8k and MATH, which consist of natural language questions that require different types of mathematical skills to answer.
WizardMath shows extraordinary capabilities on these benchmarks, as it surpasses all other open-source large language models by a substantial margin. These include GPT-3, BERT, Claude Instant-1, PaLM-2, and Minerva.
WizardMath also outperforms some of the most advanced proprietary large language models, such as ChatGPT-3.5, Text-davinci-002, and PaLM-1, on these benchmarks. This demonstrates the superiority of WizardMath in mathematical reasoning and generation tasks.

Capabilities/Use Case of WizardMath

WizardMath has many potential applications and use cases in various domains that require mathematical skills. Some examples are:

Question answering: WizardMath can answer questions that involve mathematics from different sources, such as textbooks, exams, online platforms, or user queries. For example, it can answer questions like “What is the derivative of sin(x)?” or “How many ways are there to arrange 5 books on a shelf?” .
Education: WizardMath can be used as an educational tool or assistant for students and teachers who want to learn or teach mathematics. It can provide step-by-step solutions and explanations for mathematical problems, as well as generate new problems and exercises for practice and assessment.
Science and engineering: WizardMath can be used as a scientific or engineering tool or assistant for researchers and practitioners who want to perform mathematical analysis or modeling for their projects. It can handle complex mathematical expressions and operations, such as integration, differentiation, optimization, linear algebra, etc.
Creativity and entertainment: WizardMath can be used as a creative or entertaining tool or assistant for anyone who wants to explore or enjoy mathematics. It can generate interesting or challenging mathematical expressions or concepts, such as puzzles, games, art, music, etc.

How does WizardMath work?

WizardMath works by applying its proposed Reinforcement Learning from Evol-Instruct Feedback (RLEIF) method to the domain of math. This method enhances the mathematical reasoning abilities of Llama-2, a large language model (LLM). The RLEIF method consists of three steps, as illustrated in Figure below: (1) supervised fine-tuning (SFT), (2) Instruction Reward Model (IRM) training and Process-supervised Reward Model (PRM) training, and (3) Active Evol-Instruct and reinforcement learning via proximal policy optimization (PPO).

Three steps of our Reinforcement Learning from Evol-Instruct Feedback (RLEIF)

source - https://arxiv.org/pdf/2308.09583.pdf

The RLEIF method generates diverse math instructions data by using a math-specific Evol-Instruct method. It then trains an instruction reward model (IRM) and a process-supervised reward model (PRM). The IRM indicates the quality of the evolved instruction, while the PRM receives feedback for each step in the solution. The Evol-Instruct method includes two downward evolution and upward evolution progress to produce grade school math and challenging math, respectively. Initially, the original math instruction data from GSM8k and MATH is re-generated, filtered, and fine-tuned. Then, the Llama-2 models are trained to obtain the reward models and WizardMath.

The IRM aims to judge the quality of the evolved instructions on three aspects: definition, precision, and integrity. To produce the ranking list training data of IRM, for each instruction, ChatGPT and Wizard-E are used to generate evolved instructions. Then, Wizard-E is used to rank the quality of those instructions. The PRM is used to assess the correctness of each step in the solutions generated by WizardMath. The final reward is calculated as the product of the instruction reward and the answer reward.

Performance evaluation with other Models

WizardMath has been extensively tested on two mathematical reasoning benchmarks, namely GSM8k and MATH. The results of these tests, as shown in below Figure and Table, demonstrate that WizardMath surpasses all other open-source LLMs by a substantial margin. Furthermore, it even outperforms some close-source models such as ChatGPT-3.5, Claude Instant-1, PaLM-2 and Minerva on GSM8k and simultaneously surpasses Text-davinci-002, PaLM-1 and GPT-3 on MATH.

Figure shows pass@1 performance of main LLM models on the GSM8k benchmark.

source - https://github.com/nlpxucan/WizardLM/tree/main/WizardMath

The GSM8k dataset contains approximately 7500 training data and 1319 test data, mainly on grade school level math problems. The MATH dataset collects math problems from prestigious math competitions such as AMC 10, AMC 12, and AIME. It contains 7500 training data and 5,000 challenging test data in seven academic areas.

Table shows results of pass@1 (%) on GSM8k and MATH

source - https://arxiv.org/pdf/2308.09583.pdf

In comparison with close-source models, WizardMath 70B slightly outperforms some close-source LLMs on GSM8k, including ChatGPT, Claude Instant and PaLM 2 540B. It is currently ranked in the top five on all models. Simultaneously, WizardMath 70B also surpasses Text-davinci-002 on MATH.

In comparison with open-source models, WizardMath 70B distinctly manifests a substantial performance advantage over all the open-source models across both the GSM8k and MATH benchmarks.

How to access and use this model?

The details and model weights for WizardMath are publicly available at its GitHub repository and Hugging Face. The code license, data license, and weight license are also available at its GitHub repository. You can access the code license, data license, and weight license by following the respective links. By accessing these resources, you can learn more about how to use WizardMath and incorporate it into your own projects.

If you are interested to learn more about WizardMath model, all relevant links are provided under the 'source' section at the end of this article.

Future Work

WizardMath has shown remarkable mathematics performance, as illustrated in above Figure, but it still has room for improvement compared to the SOTA LLM, GPT-4 and Claude-2. Therefore, we plan to explore new ways to enhance the RLEIF or better method to further boost the performance of our model.

Conclusion

WizardMath is not only a novel contribution to the field of natural language processing (NLP), but also a significant step forward in the journey of artificial intelligence (AI). WizardMath demonstrates that AI models can learn and improve their mathematical skills by using a combination of reinforcement learning and evolutionary computation. WizardMath also shows that AI models can solve complex problems that involve mathematics, which is a universal language that can express many concepts and phenomena in a concise and precise way. WizardMath opens up new possibilities and challenges for AI research and applications, as well as for human-AI collaboration and communication.

Source
research paper - https://arxiv.org/abs/2308.09583
GitHub Repo - https://github.com/nlpxucan/WizardLM/tree/main/WizardMath
weights - https://huggingface.co/WizardLM
Code Licens - https://github.com/nlpxucan/WizardLM/blob/main/WizardMath/CODE_LICENSE
Data Licens - https://github.com/nlpxucan/WizardLM/blob/main/WizardMath/DATA_LICENSE
Weight License - https://github.com/nlpxucan/WizardLM/blob/main/WizardMath/WizardMath/LICENSE

SocialViews From TechWorld

Pages

Monday, 21 August 2023

WizardMath: A Novel AI Model that Solves Complex Math Problems

2 comments:

Google's MLE-STAR: Winning with Real-Time Web Search