If you run a program that's deterministic, then by definition, your experiences with that program will be typical of its expected behavior. There's some important nuance there about making sure that the input you provide is the input you mean to provide, hence reproducible builds, containerization, etc.
But still, that's by contrast with programs whose behavior is inherently stochastic. There, it's easy to fool yourself into believing the program "works" when you just got lucky.
And in some sense, it should be easy to fool yourself that way! After all, it *did* work, that's a correct observation of the stochastic program's behavior! But it may not be typical, expected, or even anything other than a minor miracle.
Testing the performance of stochastic programs not only means fixing the input and initial state, but being very disciplined about how data about that program is gathered. I used to do that, did it for a decade or so, and that shit is *hard*.
That's part of the problem with "I tried an LLM and it worked, therefore LLMs are useful." It takes a lot more than trying things out to reach that conclusion.
I've used the analogy of a Vegas slot machine before; if all you see are people getting good results out of LLMs, it's easy to believe that's the typical mode of operation.
@xgranade Relevant article: LLMs use a lot of the same techniques as psychics' "cold reading" to convince people they know more than they do.
@robinadams @gerrymcgovern @xgranade For years people were convinced they could see the future in tea leaves and sheep knuckles. Heck, plenty still are. What I’m saying is don’t expect LLMs to go away quietly even if their utility is disproven.
@robinadams @xgranade what that just mean is that psychics are real ofc! Brb, gonna go give a psychic all my money
@robinadams @xgranade fascinating comparison.