To me it looks like GPT-4 is based on GPT-3.
On the other hand, there were rumors that training of GPT-3 was done with errors, but re-train was impossible due to the costs.
To me it looks like GPT-4 is based on GPT-3.
On the other hand, there were rumors that training of GPT-3 was done with errors, but re-train was impossible due to the costs.
GPT-4 is largely based on GPT-3. From the GPT-4 Technical Report:
GPT-4 is a Transformer-style model [39]
The transformer-style model originates from the paper Attention Is All You Need, which lays the foundation of GPT, GPT-2, and GPT-3.
However, there is one significant change: GPT-4 accepts images as inputs. This has been observed in the paper Learning Transferable Visual Models From Natural Language Supervision by OpenAI. We can safely predict that GPT-4 is based on GPT-3 + CLIP.
I say predict because the authors decided not to publish the architecture or anything:
Given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method, or similar.
However, we have no details regarding whether it is trained from scratch, or use any technique.
GPT-4 is a transformer like GPT-3 and any other GPT. The training is certainly new, because it has a different size, you just can not transfer GPT-3 weights into an GPT-4 to continue training.
The details of the implementation are currently not known, the published report about it is not a scientific paper about how GPT-4 works, it basically states "we do not say how it works", see below.
The size of various components certainly differ, at least the number of weights in one part, but it could also be larger everywhere. The prompt length (context size) and the maximal output size increased a lot, from 4000 tokens to 8000 or even 32000 tokens. I would expect there are some other minor differences.
Other major differences can be the amount of training data, and the compute used for training. My personal speculation is that GPT-4 used a lot more compute for training, with possibly a similar amount of training data as GPT-3.
From the "GPT-4 Technical Report", section 2:
Given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method, or similar.