Insta-Personas & Synthetic Users

Indi Young

Published in

Inclusive Software

11 min readApr 13, 2023

The ways you cannot use ChatGPT (etc) for Design Research

I thought I could avoid the fray about ChatGPT, but then it crept into Design Research.

I saw a Medium article about ways to use ChatGPT for user research, one of which was “Let’s ask ChatGPT to create a few user personas for our product.”
Then there was an ad for “user research without the headaches” (headaches?) by asking synthetic users.
I was asked by a pair of entrepreneurs to test out their interviewer ‘bot, which is behind-the-scenes helping an interviewer avoid leading questions, stepping on silences, or asking questions that are too complicated.
A technologist I know emailed me, “Give me a call.” I have always enjoyed conversing with him, so I called. He was excited, and gushed that Large Language Models (LLMs) are as important to our future as HTML was back in the 90's.

🤔

Okay, as a qualitative data scientist, I have an opinion about all this, with respect to design research. First let me go through some references about what LLMs and ChatGPT are.

Sections in this essay:

What Are These Things?
The Models Represent A Tiny Slice of Humanity, and Collected Data Without Permission
Why You Cannot Use LLM for (the insights part of) Design Research
What You Can Use The Models For in Design Research
Indi’s Interior Cognition About The Four Points at The Start
Indi’s Opinions About Speed & Growth without Support
What Indi Is Exploring with These LLM/NLP Engines
How to Do Human Design Research with Meaning

What Are These Things?

ChatGPT is a form of Natural Language Processing (NLP) that works like the predictive text in your phone keyboard to produce large sets of text based on an initial phrase. NLP is statistical language modeling (now Large Language Models, LLMs) that predict relationships between words in English. The model often works like a tree, with branches (or stemming). The values used to navigate the tree are made up of tokens — little bits of letters that occur together with a certain frequency.

NLP is not symbolic language processing. NLP statistical models do not learn grammar. NLP statistical models don’t have an understanding of real-world things or relationships. (Cats chase mice. A barn shelters livestock or holds food for livestock. Fresh cut onions make a lot of people cry.)(Some of the engines are so large that they have replicated “hierarchical structure,” but not much “abstraction.” See Christopher Roosen’s write-up using these concepts from a paper by Kyle Mahowald and Anna A. Ivanova, et al.) NLP does not understand why the letters go together. NLP only maps out the statistical probability of certain letters following other letters.

NLP/LLMs do not understand meaning.

ChatGPT does not understand the meaning of the text it generates. It just looks like it does.

🤩 Here is an understandable explanation of NLP, with excellent animation and an interview with a director at OpenAI, by linguist Erika Brozovsky, PhD. The the subtitle a spoiler: “Is It Really Language, or Just a Digital Illusion?”

The Models Represent A Tiny Slice of Humanity, and Collected Data Without Permission

Worse, most LLMs are made from scraping, without permission, a lot of text from a lot of places on the English-speaking internet. As Lisa Dance points out in her series of posts about the harms that Beware of AI Hype & Harm, there was violence in the scraped data, so a set of humans were subjected to the traumatic task of viewing it all and deciding whether certain pieces of it got deleted or stayed. Even so, the remaining text is full of discrimination, injustice, and concepts derived from white supremacist and patriarchal thinking.

🤩 Here’s Lisa Dance’s full set of posts:

1. Big Money Deprioritizes Safety

2. Historical Data = Historical Problems on Blast

3. Are We All Unpaid Workers in the AI Value Chain?

4. Who Will Be Ultimately Responsible?

5. New Mindset for AI (Always Investigate)

NLP is not “intelligence.” (In the digital & service design research world, if we can all stop ourselves from repeating the hype-phrase “artificial intelligence,” that will be a good first step.)

🤩 Sasha Costanza-Chock also wrote an awareness-raising thread: Now class, can anyone tell me why this might be a bad idea?

Why You Cannot Use LLM for (the insights part of) Design Research

ChatGPT does not understand meaning. Therefore it cannot be used to create patterns of meaning that “synthetic users” communicated to you in a study. (Yeah, the “synthetic users” app is as bad as it sounds. Niloufar Salehi gave it a whirl using the same study framing she used for helping immigrant parents navigate which Oakland school to send their kids to.)

The point of design research is that you want to understand various approaches people have to a defined purpose so you can see more broadly than the solution your org thought of. And so you can identify the levels of harm your solution is causing people your org wanted to ignore, and fix those harms.

Left-right horizontal arrow with four labels on it denoting four “levels” of harm: mild, serious, lasting, systemic. This diagram is in flux, and I’d love feedback on the wording and labels. (For example, “mild harm” contains harms like annoyance, confusion, being pestered, and frustration. Maybe that’s not “mild.”

My research partner Kunyi Mangalam calls this ChatGPT-generated bunch of words a “word salad.” It may look appealing, and we might even infer that the words in the salad mean something. But that’s just the problem — we infer the meaning. It’s subjective. Not everyone will infer the same meaning from the same word salad. From the point of view of design research, the mental act of inferring involves our own experiences and frame of reference. We make assumptions. We easily fall into confirmation bias. We completely forget that other people have different perspectives and experiences.

Yeah, confirmation bias happens when reading human-created text. But since the human creating the text intended to convey a certain meaning, you can resist confirmation bias by asking the human for the deeper meaning. (In my case, I teach listeners to notice and ask about deeper interior cognition, specifically past inner thinking, emotional reactions, and personal rules.)

What the person means is the crux of design research. ChatGPT does not communicate meaning — we infer it. That’s the only place meaning arises, in the mind of the beholder.

What You Can Use The Models For in Design Research

ChatGPT might be useful as a new kind of search engine, aware of a certain set of words and word probabilities. It can point you and some of these word probabilities that you might not have thought of yourself.

Janet Standen, Co-Founder of Scoot Insights, reports success with using ChatGPT while writing a history of qualitative research. She used reference books, Google search, and 30 in-depth interviews for the piece. Then when ChatGPT came out, she used it as “a new source to fact check the information I was finding elsewhere.” (Indi notes: I think she was fact-checking using ChatGPT in a manner akin to searching.) What was surprisingly useful to her writing was that ChatGPT “occasionally threw up an additional person or fact for me to follow up and sometimes it proved to be valid for inclusion.”

These engines might be useful for top-down categorization of things you work with that are not qualitative data, such as a list of titles where keywords connote affinity. (In qualitative data synthesis, keywords do not mean the same thing, and cannot imply affinity.)

Devika Ganapathy has been using a familiar language model, machine translation, to understand “smaller snippets, like in a diary study. Not an entire interview for which I prefer manually done transcripts.” Devika’s research includes people who speak many regional languages. “India has 22 ‘official’ languages!” Even when using Google Translate for small snippets, she restricts usage to languages that she has working knowledge of. Because machine translation is not always accurate she needs to gauge the meaning.

Devika encourages more thoughtful explorations with ChatGPT, rather than using it to replace or mimic integral research activities like interviews. (E.g. Synthetic Users). “As long as one is aware of the limitations, I have found that ChatGPT can be a helpful tool. It can be used to get out of a writer’s block. I have used it to help rearticulate something that didn’t seem quite right, or to jog my memory about something I may not have included.”

Indi’s Interior Cognition About The Four Points at The Start

As for the four items in the list at the start of this essay, here’s what went through my mind. (There are some opinions, too.)

1. I saw a Medium article about ways to use ChatGPT for user research … For this first item in that list, well, only unaware practitioners use made-up personas in their design work. People who know that archetypes come from empirical qualitative data don’t invent personas. So maybe the ChatGPT versions of the made-up personas will perpetuate that bad habit long enough for people to see the futility of using made-up knowledge. Maybe it will all whirl down into the vortex. (My cynical self knows that many of these people are not exposed to qual data skills-building, and so I hope, cynically, that when the veil publicly comes off ChatGPT-generated text, such folks might feel a bit uneasy about their use of the tool. And we can help them understand that uneasiness.)

2. Then there was an ad for synthetic users … This is what went through my mind on this second one: The entrepreneurs behind the synthetic users crow, “It works!” It gives them a bowl of word salad that they interpret in a way that works for them. It allows them all the confirmation bias they want. A doubtful researcher, Jan Dittrich, writing about the overall use of “AI” in design research said, “It still might work as an ‘oracle,’ giving people claims to act with: because they believe that it has meaning.” Jan points out that research “results” are not just “an answer,” but symbolic of all the decisions, discussions, and people involved in a study. The point of doing a study is to create “a more nuanced and interesting view on the field.” (see also: What is the Effect of User Research on the Researcher, by Jan Dittrich)

3. I was asked to test out their interviewer ‘bot … On this third item, my immediate thought was: It’s not just about avoiding leading questions, but understanding the meaning of what the person is saying. Sensing if they are implying something. Helping them unfold their interior cognition from that moment in the past where they were addressing their purpose. It’s about building rapport with that person. Showing them that you respect their thinking. Making sure they feel safe to communicate their interior cognition to you, and what you or your org might do with it. That said, I’m game to give their interviewing-assistant app a whirl this month just to see how it feels to me. Maybe it is not terrible for a beginner, but realtime coaching is not a great way to learn. Reviewing the recording/transcript alongside a coach (or maybe alongside this ‘bot) is a better way to learn. Adding a ‘bot real-time in the background for the interviewer to pay attention to is a detractor. The interviewer should pay 110% attention to the person.

4. A technologist I know gushed that LLMs are as important at HTML … I gave a serious shot at pushing back on the technologist, challenging him to care about the harms that have already been done. I pointed him to the set of posts by Lisa Dance, who focuses on the harms that already come from using models to predict human behavior. I will check in with him again later this month to see what his mind has been processing. Probably nothing related to design research, since thankfully, he’s not interested in building tools for design research.

Indi’s Opinions About Speed & Growth without Support

Do we have a problem in design research? Yes. That problem is speed. That problem is the fact that so many of our orgs are either interested in profit (by slapping something up that will get attention) or are chained to the “growth every year” stake, where investors take priority over any people the org originally intended to support.

Another problem is that our orgs can’t clearly think about people’s purposes. (LOL, read a few mission statements — they tend to vaguely describe the groups people they want to support, but not the people’s purpose they want to support.) We can help with that.

There are ways that our orgs can get free of the chains and focus on supporting people’s purposes. There is a future where we are sustainably making a variety of solutions for a variety of approaches to the purposes.

Erika Hall names this shift in thinking “Enough” … a mindset and understanding of what constitutes enough. Such as, “What constitutes enough profit for our org make a deeply nuanced difference for a much wider variety of people’s approaches to the purposes?” It can lead to long, healthy lives for people, orgs, ecosystems, and the planet.

Here’s to hope and to humanity and Earth and to each other and all the variety of ways that we approach the purpose of making knowledge.

What Indi Is Exploring with These NLP/LLM Engines

I am looking into creating code to sift through transcripts as a first rough pass, to get us past the block where our minds go, “Yikes, this is a lot of data to synthesize.” This first rough pass might (or might not) produce rows of quotes from the transcripts where each row qualifies as one of these three concept types:

inner thinking
emotional reaction
guiding principle (personal rule)

The code will only work with the words in the transcripts. It will not generate new words. I’m not yet convinced any of the NLP/LLM engines out there are capable of recognizing the concept types in a transcript. But I’m going to see if it may one day function semi-helpfully.

How to Do Human Design Research with Meaning

The human work is high-touch. The transcripts are written by humans who add notes about tone of voice, references, and whether the person is being cynical, joking, etc. These transcripts are made from recording(s) of a listening session, where the listener respectfully helps the person unfold their interior cognition, which is those three bullet points above. See Time to Listen.

Inviting and absorbing new perspectives into your org takes work.

It’s valuable, powerful, central work for the org. If you want to shortcut it, that is the same as deciding not to breathe or eat. Your org will take risks with humans, cause harm, and hopefully collapse quickly. (Yeah, it’s wishful thinking. There are plenty of these orgs causing greater and greater harm to humans which have not collapsed yet, because of their extractive, rather than supportive, nature.)

There will be work for us humans to dig into. Not only will these rows of quotes:

include repeated concepts that haven’t been collected together
include tangled concepts that haven’t been untangled
be missing the implied concepts and the generalized concepts
accidentally include description or expression layer concepts

… but also these rows of quotes will need a human to see meaning, and summarize using the person’s words, figuring out the verb and the key point, then deciding whether supporting details will be of help in the next stage as well as during the use of the models. I don’t want an engine writing a summary of each quote because it will generate a summary that is not representative of that person’s meaning.

I teach a two-part course on empirical qualitative data synthesis, which is a course that has one pre-requisite: Listening Deeply (or Time to Listen). It is a course that teaches you how to work.

This work provides the human perspective that is central to any org that wants to support people. (Is your org one of these?)