Pourquoi ChatGPT a du mal avec les mathématiques

Have you ever tried using an AI tool like ChatGPT for calculations and found that it doesn’t always match up? It turns out there’s a reason for that.

As large language models (LLMs) like OpenAI’s ChatGPT become increasingly widespread, people are relying on them more for work and research assistance. Yuntian Deng, an assistant professor at the David R. Cheriton School of Computer Science, discusses some of the challenges related to the reasoning capabilities of LLMs, particularly in mathematics, and explores the implications of using these models to facilitate problem-solving.

What flaw have you discovered in ChatGPT’s ability to do math?

As I explained in a recent post on X, the latest reasoning variant of ChatGPT o1 struggles with large-digit multiplication, especially when multiplying numbers beyond nine digits. This is a notable improvement over the previous ChatGPT-4o model, which struggled even with four-digit multiplication, but it remains a significant flaw.

Is OpenAI’s o1 a good calculator? We tested it on multiplication up to 20 × 20: o1 solves multiplication up to 9 × 9 with decent accuracy, while gpt-4o struggles beyond 4 × 4. For context, this task can be solved by a small LM using implicit CoT with step internalization. 1/4 pic.twitter.com/et5DB9bhNL
– Yuntian Deng (@yuntiandeng) September 17, 2024

What implications does this have on the tool’s reasoning ability?

Large-digit multiplication is a useful test of reasoning because it requires a model to apply principles learned during training to new test cases. Humans can do this naturally. For example, if you teach a high school student how to multiply nine-digit numbers, they can easily extend this understanding to handle ten-digit multiplication, demonstrating an understanding of underlying principles rather than mere memorization.

In contrast, LLMs often struggle to generalize beyond the data they were trained on. For instance, if an LLM is trained on data involving multiplication of numbers up to nine digits, it generally cannot generalize to ten-digit multiplication.

As LLMs become more powerful, their impressive performance on challenging benchmarks may give the impression that they can « think » at advanced levels. It is tempting to rely on them to solve new problems or even make decisions. However, the fact that even o1 struggles to reliably solve large-digit multiplication problems indicates that LLMs still face challenges when asked to generalize to new tasks or unknown domains.

Why is it important to study how these LLMs « think »?

Companies like OpenAI have not fully disclosed the details of how their models are trained or the data they use. Understanding how these AI models work allows researchers to identify their strengths and limitations, which is essential for improving them. Moreover, knowing these limitations helps us understand which tasks are best suited for LLMs and where human expertise is still crucial.

University of Waterloo

Source

What flaw have you discovered in ChatGPT’s ability to do math?

What implications does this have on the tool’s reasoning ability?

Why is it important to study how these LLMs « think »?

Autres articles

Faire sa marque dans la capitale nationale | Actualités du MIT

Le porte-parole interactif ouvre de nouvelles opportunités pour les données de santé, les technologies d’assistance et les interactions mains libres | Actualités du MIT

Utilisation de transformateurs de visage câlins pour la détection d’émotions dans le texte