Large language models = intersubjectivity?
When old-school quant people criticise qualitative methods as being merely subjective rather than objective, our reply (up to 2023) has always been something like this:
- We don’t aim for objectivity, which is an illusion; we aim for intersubjective verifiability. In fact, scientific objectivity is only (sometimes) possible because of intersubjectivity - for example that people agree how to use measuring instruments.
At its most basic, intersubjective verifiability simply means that we aim to be able to make statements which most relevant stakeholders would agree with; if necessary we also provide the steps necessary to verify them. Sometimes this means breaking down more ambitious claims into smaller steps. A good example is QDA (qualitative data analysis). No-one would claim that a qualitative report written as a summary of say a set of interviews is reproducible or objective or even intersubjectively valid; it’s always going to be, at the very least, flavoured by the analyst’s own positionality. We can make the process a bit more intersubjectively verifiable by breaking down the task and providing more detailed instructions of how to do the analysis, (though this may also limit the creativity needed to arrive at fundamentally new insights). We might not be able to aim for reproducibility (another analyst given the same instructions would arrive at the same results) but we can aim for Nachvollziehbarkeit or retrace-ability: in retrospect, someone else would at least be able to retrace the steps and agree that the results are one plausible answer to the question.
Now, it's 2023 and we’re all drowning under dozens of recent posts and articles about using Large Language Models (LLMs) as a way of summarising or answering questions about one or more texts — documents, interviews etc. Whatever the pros and cons, these possibilities are revolutionising social science because they can suddenly level up qualitative research by making it almost reproducible. This isn’t just any reproducibility in the sense that somebody’s computer program will always produce the same results on the same text (but who knows how the code works, will it even work next year)? General LLMs are different because the prompt is the program: ideally, a good prompt is just the same literal natural-language instructions you would write for your postgrad assistant. It shouldn’t matter that no-one really knows how ChatGPT works any more than you care if you know how your postgrad assistant’s brain works. Ideally it should be irrelevant which assistant (postgrad, OpenAI, Bard, etc) follows the instructions, and we might expect to see the results of different platforms gradually converge. (This is assuming that we set the “temperature” of the prompt to 0 to discourage creativity — which has the downside of reducing the possibility of producing fundamentally new insights).
Wait, you say, but these LLMs are created by skimreading the whole internet and basically answer the question “what would the whole of the internet say about this” (with a bit of politeness added by various sets of LLM guardrails)? And the internet is, as we know, is a reflection of our own flawed species, and a very unequal reflection at that. You’ll probably find more text on the internet about the Barbie movie than about climate disasters in the last few weeks. When applying an LLM to analysing a report or interview, you’re basically entrusting your work to the internet’s worldview.
Yet the results are mostly amazing! (I’m not talking here about asking the LLM to give factually accurate references; I’m talking about the much more interesting task of asking it to follow a set of instructions.) I think our humanity shines back at us. Wittgenstein might have agreed that LLMs can, like us, do that amazing thing: following a rule, and even the spirit of the rule, in the sort of way that most of us would agree is right (just 100s of times faster and without getting tired).
It’s as if the internet as hoovered up by LLMs is an embodiment of intersubjectivity. And perhaps it is, both in the epistemological sense (how do we know what is true?) but also in the social or even metaphysical sense according to Husserl and co: a collective creation, a life-world. Chatbots to an extent share in our language games.
Applying ChatGPT to a research question, when done well, can be like saying: let’s do this in a shared way which everyone can agree on.
🔥 Yes, there are hundreds of caveats. Who puts what on the internet is just a reflection of our species: mostly colonial, patriarchal, greedy and exploitative. But what are you going to do except engage with these developments? Which other species are you rooting for, seriously?