why do everybody want to make it count letters and stuff like this? isnt this what its worst at? im against ai but this seems like it is being used for what its worst at and not designed to do, like checking how good a bike is by trying to ride it underwater
It’s a quick way to say that the emperor still has no clothes.
We’re boiling the oceans to train the models, and these are well-publicised failure modes. If they haven’t fixed it, it seems to suggest they CAN’T fix it with the tools and architecture they have. So what other problems is it whiffing on that aren’t trivially checkable?
If marketing boxed in the product and said “it does these ten things well”, we might be willing to forgive limitations when we leave its wheelhouse. Nobody kvetches that Microsoft Word is an awful IDE, after all. But that would require a retreat from a public that’s been promised Lt. Cmdr. Data in your pocket, and investors that have priced it as such.
OpenAI, because GPT-5 has only improved in terms of coding so as not to fall behind Anthropic. In addition, it does not consume more resources than its previous models.
In other aspects, there has been no improvement, it has declined slightly, or the improvement has been minimal. Does this mean that OpenAI and the sector are going to collapse?
Probably not, because there were only problems with the launch of this model. In the coming months, they will launch another model that solves this, although it may not.
I wouldn’t be surprised if they’d just wrote a script for it to use in the back specifically for this sort of question in the future and call it a day. Just to shut up the critics.
They seem to already be doing that as examples come up like glue on pizza, but there are a nearly infinite number of similar issues that are going to need scripting.
why do everybody want to make it count letters and stuff like this?
Dunno about the others; I do it because it shows well that those models are unable to understand and follow simple procedures, such as the ones necessary to: count letters, multiply numbers (including large ones - the procedure is the same), check if a sequence of words is a valid SATOR square, etc.
And by showing this, a few things become evident:
That anyone claiming we’re a step away from AGI is a goddamn liar, if not worse (a gullible pile of rubbish).
That all talk about “hallucinations” is a red herring analogy.
That the output of those models cannot be used in any situation where reliability is essential.
It’s a chatbot that can see and draw, but rich idiots keep pushing it as an oracle. As if “a chatbot that can see and draw” isn’t impressive enough.
Three years ago, ‘label a tandem bicycle’ would’ve produced a tricycle covered in squiggles. Four years ago it was impossible. I don’t mean ‘really really hard.’ I mean we had no fucking idea how to make that program. People have been trying since code came on punchcards.
LLMs can almost-sorta-kinda do it, despite being completely the wrong approach. It’s shocking that ‘guess the next word’ works this well. I’m confused by the lack of experimentation in, just… asking a different question. Diffusion’s doing miracles with ‘estimate the noise.’ Video generators can do photorealism faster and cheaper than an actual camera.
The problem is, rich idiots claim this makes it an actual camera. In that context, it’s fair to point out when a video shows the Eiffel Tower in Berlin. It’s deeply impressive that computers can do that, now. But we can’t let it guide people’s vacation plans.
This particular model was advertised as being able to use tools, like calculators. And it includes a calculator tool in the package.
So, it should in theory, not have the probabilistic limitations of its native algorithms.
However, there are still the limitations created by multiple posts, where the model will change its answer in relation to previous inputs by users. This may have become even worse in model 5 because it supposedly remembers pervious conversations.
why do everybody want to make it count letters and stuff like this? isnt this what its worst at? im against ai but this seems like it is being used for what its worst at and not designed to do, like checking how good a bike is by trying to ride it underwater
It’s a quick way to say that the emperor still has no clothes.
We’re boiling the oceans to train the models, and these are well-publicised failure modes. If they haven’t fixed it, it seems to suggest they CAN’T fix it with the tools and architecture they have. So what other problems is it whiffing on that aren’t trivially checkable?
If marketing boxed in the product and said “it does these ten things well”, we might be willing to forgive limitations when we leave its wheelhouse. Nobody kvetches that Microsoft Word is an awful IDE, after all. But that would require a retreat from a public that’s been promised Lt. Cmdr. Data in your pocket, and investors that have priced it as such.
OpenAI, because GPT-5 has only improved in terms of coding so as not to fall behind Anthropic. In addition, it does not consume more resources than its previous models.
In other aspects, there has been no improvement, it has declined slightly, or the improvement has been minimal. Does this mean that OpenAI and the sector are going to collapse?
Probably not, because there were only problems with the launch of this model. In the coming months, they will launch another model that solves this, although it may not.
It may not consume more resources, but the older models were already consuming way too many resources to pump out their bullshit machines
deleted by creator
They seem to already be doing that as examples come up like glue on pizza, but there are a nearly infinite number of similar issues that are going to need scripting.
You shouldn’t confront a bullshitter when you catch it in the act!
Al seems to know everything, until it’s a topic where you have firsthand knowledge
Dunno about the others; I do it because it shows well that those models are unable to understand and follow simple procedures, such as the ones necessary to: count letters, multiply numbers (including large ones - the procedure is the same), check if a sequence of words is a valid SATOR square, etc.
And by showing this, a few things become evident:
It’s a chatbot that can see and draw, but rich idiots keep pushing it as an oracle. As if “a chatbot that can see and draw” isn’t impressive enough.
Three years ago, ‘label a tandem bicycle’ would’ve produced a tricycle covered in squiggles. Four years ago it was impossible. I don’t mean ‘really really hard.’ I mean we had no fucking idea how to make that program. People have been trying since code came on punchcards.
LLMs can almost-sorta-kinda do it, despite being completely the wrong approach. It’s shocking that ‘guess the next word’ works this well. I’m confused by the lack of experimentation in, just… asking a different question. Diffusion’s doing miracles with ‘estimate the noise.’ Video generators can do photorealism faster and cheaper than an actual camera.
The problem is, rich idiots claim this makes it an actual camera. In that context, it’s fair to point out when a video shows the Eiffel Tower in Berlin. It’s deeply impressive that computers can do that, now. But we can’t let it guide people’s vacation plans.
Because they are promoted as being able to do anything when jammed into everything.
We do not applaud the tenor for clearing his throat, as they say; yes. But we also do not applaud the tenor who can’t even do that.
This particular model was advertised as being able to use tools, like calculators. And it includes a calculator tool in the package.
So, it should in theory, not have the probabilistic limitations of its native algorithms.
However, there are still the limitations created by multiple posts, where the model will change its answer in relation to previous inputs by users. This may have become even worse in model 5 because it supposedly remembers pervious conversations.