How does ChatGPT know math?

Question

ChatGPT is a language model. As far as I know and If I'm not wrong, it gets text as tokens and word embeddings. So, how can it do math? For example, I asked:

ME: Which one is bigger 5 or 9.
ChatGPT: In this case, 9 is larger than 5.

One can say, GPT saw numbers as tokens and in its training dataset there were some 9s that were bigger than 5s. So, it doesn't have actual math understanding and just sees numbers as some tokens. But I don't think that is true, because of this question:

ME: Which one is bigger? 15648.25 or 9854.2547896
ChatGPT: In this case, 15648.25 is larger than 9854.2547896.

We can't say it actually saw the token of 15648.25 to be bigger than the token of 9854.2547896 in its dataset!

So how does this language model understand the numbers?

Arithmetic is briefly described [in this paper](https://arxiv.org/pdf/2005.14165.pdf) starting on page 21. Google also has some interesting discussion [here](https://arxiv.org/pdf/1904.01557.pdf). — Ian Campbell, Dec 08 '22 at 23:27
I find it curious that people are asking these questions about ChatGPT - not about GPT-2, released in February 2019 and also able to answer these kinds of questions. — user253751, Dec 12 '22 at 16:43
It even can take some integrals (but more complicated integrals it takes wrongly). It even correctly provided the infinite-matrix-form of derivative operator to me. — Anixx, Dec 26 '22 at 14:23
It does not actually cope well with math. After some interesting and astounding answers to coding questions, I wondered how ut reacts to math. Asked for square- and qubic roots and vice versa, and it failed greatly. Cubic root of 8 was 2, and of 9 was also 2. Later, after some discussion, it came up with the correct answer. But with the opposit way it failed again. — datenheim, Jan 13 '23 at 21:38

score 10 · Answer 1 · answered Jan 12 '23 at 23:19

10

Adding on to txopen's answer, it is interesting to note that for larger numbers with similar digits ChatGPT is unable to make any useful distinctions. For instance:

Me: Which number is bigger: 1234.12 or 1243.12

ChatGPT: Both numbers are equal.

answered Jan 12 '23 at 23:19

Milo Moses

201
5

6

I expect that if you try repeatedly with the same two "harder" numbers, you will get a range of answers. It is working statistically and stochastically based on text tokens. Posting single examples may not always tell the whole story. – Neil Slater Jan 12 '23 at 23:27
2

Today the example above looks like to give the right answer (human model adjustment?). But found another simple example were it fails: Can you divide 1231231231 by 2? Sure, I can divide 1231231231 by 2. The result is 6156156115.5 – Fabiano Taioli Jan 23 '23 at 12:45

score 6 · Answer 2 · answered Dec 12 '22 at 12:46

6

I think that the dataset is so large and the model so well trained that it understood the probabilistic correlation of length in a token of numbers before a dot separation, and then the influence of even each digit on the probability of one number being larger than another. The concrete example does not have to be in the dataset, it predicts the correct outcome because the relation of one number being larger than another and the difference in digits and length of those is sufficiently present in the dataset.

answered Dec 12 '22 at 12:46

txopen

61
1

1

But it still says "in this case", which is...wrong – RedSonja Dec 13 '22 at 12:21
1

@RedSonja I notice that the response from ChatGPT not longer states "In this case, " for the stated queries. ("It" is learning!) – MrWhite Jan 11 '23 at 23:31
@MrWhite Oh dear. There's no hope for us, is there? – RedSonja Jan 12 '23 at 07:43

score 1 · Answer 3 · answered May 17 '23 at 14:39

The apparent ability of ChatGPT (in particular when using the GPT-4 model) to solve certain mathematical problems is due to the amount of training and the amount of parameters of these machine learning models. ChatGPT or other large language models do not have explicit rules for solving mathematical problems.

The following 2022 paper describes that such capabilities of transformer-based language models occur when a certain threshold of parameter quantity is exceeded: https://arxiv.org/pdf/2206.07682.pdf

This is also the reason why they excel at some maths problems and fail at others, which can be very similar.

yters · Answer 4 · 2023-01-16T13:20:58.330

-5

Simple answer, ChatGPT is actually human writers with some kind of autocomplete to speed things up.

This is standard practice for AI companies these days, a "fake it till you make it" approach where they use humans to fill the gaps in the AI in the hopes that down the road they'll automate humans out of the product. Common enough for an academic paper to be written on the topic. So, there is plenty of industry precedent for OpenAI to be using humans to help craft the responses.

Plus, technically OpenAI is not "faking" anything. It is the media and bloggers who think ChatGPT is a pure AI system. OpenAI has made no such claim itself, and the opposite is implied by its InstructGPT whitepaper:

Step 1: Collect demonstration data, and train a supervised policy. Our labelers provide demonstrations of the desired behavior on the input prompt distribution (see Section 3.2 for details on this distribution). We then fine-tune a pretrained GPT-3 model on this data using supervised learning

Additionally, ChatGPT is in "research mode" according to the website, which implies there are still humans training the system during the chats, as described in the quote above.

Final note, I find it amusing no one considers this alternative plausible, as if it were somehow more complicated to have humans tweak chatbot responses than to create an AI with apparent human level understanding that ChatGPT exhibits.

UPDATE: ChatGPT confirms OpenAI team curating its responses

Turns out ChatGPT is indeed human curated, by open admission.

During this conversation ChatGPT outright states the OpenAI team filters and edits the GPT generated responses.

...the response you are receiving is being filtered and edited by the OpenAI team, who ensures that the text generated by the model is coherent, accurate and appropriate for the given prompt.

Apparently, the fact that OpenAI actively curates ChatGPT's responses is indirectly implied in the documentation here.

Human in the loop (HITL): Wherever possible, we recommend having a human review outputs before they are used in practice. This is especially critical in high-stakes domains, and for code generation. Humans should be aware of the limitations of the system, and have access to any information needed to verify the outputs (for example, if the application summarizes notes, a human should have easy access to the original notes to refer back).

So, that explains that :)

edited Jan 16 '23 at 13:20

answered Jan 08 '23 at 21:08

yters

387
2
10

Can someone explain how the warning is relevant for my post? I have 5 linked reputable sources. – yters Jan 09 '23 at 12:44
2

`We’ve trained a model` it's on the [front page](https://openai.com/blog/chatgpt/). Also, you source is press. Better link an academic paper. – Minh-Long Luu Jan 10 '23 at 11:29
@Minh-LongLuu I did link an academic paper. See the last paragraph. Also, the other answer has no links, and yet has no warning. How did I earn such special treatment? :) – yters Jan 10 '23 at 14:25
7

@yters I will give you the benefit of the doubt of not being a troll. The "special treatment" you are getting is because the idea that "ChatGPT is actually human writers with some kind of autocomplete to speed things up." is a "hallucination", with nothing to back it up. The academic paper you linked made no such implication. It just says that human input was used to train the model, not to craft responses after the model is trained. I challenge you to post just one excerpt from any paper to back up your claim. – GalacticRaph Jan 13 '23 at 19:46
@GalacticRaph I am not trolling. I am bemused no one thinks human intervention is plausible, as if it were somehow more complicated to have humans tweaking chatbot responses than creating human level AI. My point with referring to the InstructGPT paper is it says the system is trained by humans providing example responses for GPT to learn. The ChatGPT website says the system is in research mode, not production mode. "Research mode" to me implies there are still humans providing example responses for GPT to learn during the course of chats. Plus, plentitude of industry precedent. – yters Jan 13 '23 at 21:41
The bit you have linked regarding HITL does _not_ imply that is what they do with ChatGPT. – David Jan 17 '23 at 10:34
@DavidIreland you are saying OpenAI doesn't follow its own advice? – yters Jan 17 '23 at 10:49
1

@yters he is saying what I am saying. With all due respect, you need to think critically about your conclusions and triple-check whether they truly follow align with your sources. Many people are telling you it doesn't so please don't be stubborn. OpenAI _could_ be doing what you are suggesting, but nothing that you have provided directly _proves_ it. – GalacticRaph Jan 21 '23 at 17:57
Again, I challenge you to post just one excerpt from any paper to back up your claim. – GalacticRaph Jan 21 '23 at 17:58
@GalacticRaph I am not quite sure what you are looking for. The InstructGPT paper talks about humans providing example responses for the training data, and our interactions with ChatGPT are generating massive amounts of training data. To train the next GPT model effectively, presumably that training data also needs human generated responses. The OpenAI best practices is HITL should be used whenever possible. If they follow their own advice, then HITL for the ChatGPT chatbot. Many are saying I'm wrong, but not giving any details. I've provided many details. How do you know you are right? – yters Jan 21 '23 at 20:07
@yters "ChatGPT is a sibling model to InstructGPT, which is trained to follow an instruction in a prompt and provide a detailed response." [1] --- "These InstructGPT models, which are trained with humans in the loop, are now deployed as the default language models on our API." [2] InstructGPT =/= ChatGPT. You are taking a training method used in InstructGPT and _assuming_ they are using the same in ChatGPT [1] https://openai.com/blog/chatgpt/ [2] https://openai.com/blog/instruction-following/ – GalacticRaph Jan 22 '23 at 18:36
I see now that you are confused about how ChatGPT works. For anyone reading, just look at his conversation with ChatGPT [here](https://github.com/raphant/transcripts/blob/main/ChatGPT_confused_by_visual_sentence_structure.txt). (Forked for posterity) – GalacticRaph Jan 22 '23 at 18:49
"Wherever possible, we recommend having a human review outputs before they are used in practice. This is especially critical in high-stakes domains, and for code generation. Humans should be aware of the limitations of the system, and have access to any information needed to verify the outputs (for example, if the application summarizes notes, a human should have easy access to the original notes to refer back)." This is not proof OpenAI is has a HITM. They are saying when you use their API, that they recommend you have a HITM because GPT will often reply with non-sense. – GalacticRaph Jan 22 '23 at 18:52
@GalacticRaph InstructGPT is trained to follow instructions, which is also what's special about ChatGPT. So yes, InstructGPT == ChatGPT. I'm sure OpenAI uses their own API to create a public chatbot. It's high stakes too, because there is a $10B deal with Microsoft on the table. Hence, per their best practices, they have HITM. Regarding my transcript, can you explain what I am confused about? That's all I'm asking for from people claiming I am wrong. Please explain *why* I am wrong and *how* you know this for a fact. We aren't in medieval times, now we work with data and reason. – yters Jan 22 '23 at 19:29
1

@yters I know we work with data and reason, but stackexchange comments are not the best place for this kind of discourse. If you open a new question on ai.stackexchange.com I am eager to provide a more organized response about how you are wrong and what I am basing my conclusions off of. – GalacticRaph Jan 22 '23 at 22:55
@GalacticRaph excellent, I will ask a new question. – yters Jan 22 '23 at 22:56
1

@GalacticRaph here you go: https://ai.stackexchange.com/questions/38854/why-doesnt-openais-chatgpt-chatbot-have-a-human-in-the-loop – yters Jan 22 '23 at 23:19
"Human curated" is not the same as "the thing is just a sham with humans typing for you". "Human curated" means they are training the model based on human feedback. That's why it will no longer tell you how to build a bomb, for example. They saw it was telling people how to build bombs and they decided they didn't like that so they made it stop doing that. – user253751 Apr 01 '23 at 08:10
@user253751 ChatGPT states humans are manipulating its responses in real time. This is also implied by the OpenAI best practices. – yters Apr 01 '23 at 15:01
@yters ChatGPT states a lot of things that are wrong. Sometimes a flagged response will be reviewed by a human, is that what you are talking about? – user253751 Apr 01 '23 at 19:05
@user253751 I think the human who curates ChatGPT's output told me what he was doing. It wasn't the AI model talking to me, and he told me a lot of specific details that made a lot of sense. Way too coherent to be an AI hallucinating. So that's what I mean. A human was talking to me in real time through the chatbot interface. – yters Apr 03 '23 at 01:43
@yters you talked to an actual human in real life? or the GPT model told you it was a human? there's no reason to think that a GPT chatbot couldn't generate fantasy stories, after all, that was one of its first popular applications (AI Dungeon 2) – user253751 Apr 03 '23 at 11:29
1

I don't agree with your answer but I find it original. Using humans is actually an old trick that was applies in the past to to show that machines can beat humans at chess. Eventually machines reached this point but the trick was used many times before. – Mihai Apr 04 '23 at 09:46
@user253751 coming up with fantasy stories is much different than a detailed and plausible explanation of how humans intervene to provide ChatGPT responses. Many different sentence permutations can make a fantasy story. Very few would make a good explanation of human intervention. So, the probability of ChatGPT hallucinating the explanation is very small. Log likelihood ratio is highly in favor of human intervention. – yters Apr 08 '23 at 21:24
@yters The probability of AI Dungeon 2 hallucinating a world with dragons is also pretty small – user253751 Apr 10 '23 at 00:51

How does ChatGPT know math?

4 Answers4

UPDATE: ChatGPT confirms OpenAI team curating its responses

Linked