Next Token Prediction Is Enough

September 22, 2024

Ilya: I challenge the claim that next token prediction cannot surpass human performance

It looks like on the surface it cannot. It looks on the surface if you just learn to imitate, to predict, what people do, it means that you can only copy people. But here is a counter argument for why might not be quite so: If your neural net... is if your [base?] neural net is smart enough..., you just ask it — like — "what would what would a person with like great insight and wisdom and capability do?". Maybe such a person doesn't exist, but there's a pretty good chance that the neural net will be able to extrapolate how such a person should behave. Do you see what I mean?

Presenter: Yes, although where would it get the sort of insight about what that person would do if not from...

Ilya: From the data of regular people. Because, like, if you think about it: what does it mean to predict the next token well enough? What does it mean actually? It's actually ... it's a much ... it's a deeper question than it seems. Predicting the next token well means that you understand the underlying reality that led to the creation of that token.

It's not statistics. Like, it is statistics, but what is statistics?

In order to to understand those statistics, to compress them, you need to understand what is it about the world that creates those statistics. And so then you say: okay, well I have all those people, what is it about people that creates their behaviors? Well they have — you know — they they have thoughts and they have feelings and they have ideas and they do things in certain ways, all of those could be deduced from next token prediction.