I went to college for filmmaking. Being a field that revolves around communication, we ended up learning some fundamentals of communication theory. By definition, communication happens when the “listener” percieves a message - whether that message was intended or not. No matter how much work and thought you put into your message, if you shout it into the void, it isn’t communication. Likewise, you can say nothing in a room full of people, and still unintentionally communicate just by being seen.
The percieved intelligence of LLMs very much feels like a result of this. There is no intelligence in the machine, we’re just percieving communication from a probability machine, and thinking that it’s intelligent and trying to communicate with us.
It’s glorified auto correct
Interesting approach, but why do they use the tiniest models ever? Also, while I understand why they do it, why focus on token level or even smaller manipulation that will be heavily constrained by design?