ChatGPT related, part three.
I really haven’t been able to keep up with much of the technical mechanics of all the different “large language models” everyone is so rightly interested in these days. Therefore, my opinions here might not be fair to the widespread idea that the interesting recent “AI” model outputs we’re seeing, arise from “unexpected” and emergent phenomena, once these models are scaled up to the tens of billions in their number of tunable parameters. The story, as I have it, is that harvesting all the text on the public internet, as well as other various sources of text, and using this text data to train models of word prediction—i.e. predicting the next 1–3 most likely words that historically would come next—and then scaling up these models in size, afforded by using selective attention mechanisms, is responsible for significantly improved word prediction accuracy, or lengthening the distance into the future words predicted to appear in a sentence or paragraph. The important scientific claim being that these models “somehow” jump from “just” predicting words, to intelligently applying knowledge learned from predicting words to shape functional roles for those words, effectively bootstrapping a recurrent process of incremental “semi-supervised” improvement using feedback from the iterated predictions. That’s roughly what I think is the dominant narrative about what is happening with these models.
I think that’s probably true a bit. But, I think there’s a less exciting interpretation, or alternative hypothesis at least, that might be generously garnered from Noam Chomsky’s comments on this topic. I admit that I’m conjecturing all this from ridiculously truncated soundbites that I vaguely remember from a couple months ago, but ignorance also excusably facilitated by little obstacles like New York Times paywalls for example. His stance appeared to me to fit a classic pattern of needlessly cantankerous and cryptic positions, but, positions that are also very easy to misunderstand, and that have previously become honeypots for intellectuals that underestimate how well Chomsky usually covers his bases. I believe he referred to these “LLM” models as producing types of glorified plagiarism, or something like that. As nobody has the time to look into the nuance of statements like these, that don’t seem to directly address the central issue of “emergent” language entities, probably nobody has looked into the matter further, and just moved on. “Chomsky is getting old, and last I came across him he seemed to be late in understanding the Ukraine situation, so he must be just babbling or something due to age, so I’ll just see what’s up on Tik Tok again”, someone might say upon seeing the issue come up derisively in r/machinelearning or Scott Aaronson’s blog. As I say though, it is my experience that Chomsky often has a safe, wise, conclusion in mind when he says something easily perceived as provocative. Even with his Ukraine opinions, one could say he had a responsibility to be last on board with anything, so he wasn’t wrong at all about being very skeptical about what was happening with that.
In this case of recent LLM performance, I think one could connect his seeming dismissiveness to some of his previous “infuriating” positions on the topic of our “cognitive limits”. Nobody likes that topic I find, because it doesn’t feel like it could possibly feel mentally stimulating to study something that immediately is impossible to study. But what if GPT, scaled up an order of magnitude, is really just exceptionally good at plagiarism, like he’s saying? Where we as individuals just don’t know all the different writings there have been on millions of topics, so we just can’t really see the plagiarism, unless on a topic we’re intimately familiar with? Our individual cognitive limit is exceeded in this case, so we’re inclined to see magic, but it’s just plagiarism. Thus, the “babbling” about cognitive limits matters.
Probably both of these hypotheses have some truth to them. But who knows because there’s hardly time to read all the expert thoughts on the matter, especially when important models aren’t even public, and experts are disagreeing or fighting for attention or confused. I think the super-plagiarism metaphor should be like a null hypothesis we can do some comparisons with later on, and rather we should keep the focus on what it means to coexist in a society with entities that can only be understood if we collectively share our expertise. That’s what I’m getting at here.
If my ignorance is showing, do enlighten me. I freely admit that I’m conjecturing about things I could really look more closely at. I just don’t have time to look really, as the pace of things can take on the look of a hopeless shell game.
Noam Chomsky reference remarks about ChatGPT:
Chomsky reference passage on cognitive limits, with bonus me dancing.
Other videos I’ve been kind of watching:
Papers recently read:
None on this topic in months, but some of the ML playlist videos are visually focused on relevant pdfs and I am surprised by some of what I have seen in them. For example, simply having GPT-4 reflect on what it has just said, really noticeably makes a difference. I mean, of course, but still, it was a striking contrast with the claim that GPT-3 didn’t really improve using this reflection technique. The comparison graph is in one of the recent papers about ChatGPT and reflection, and discussed in one of the recent ML playlist videos linked above. No time to serve it to you on a silver platter tonight. I’ll edit this later if nuance is needed.