In the corner of the Critical Research office sits a man called Mark Jacobs. Nominally, he provides ‘IT Support and Development’, but in reality, he’s an expert computer programmer and general coding wizard. From time to time we tease him that any day now, we’re going to replace him with AI or ChatGPT because…
1) it knows how to write code and
2) it doesn’t talk about conspiracy theories as much.
Mark, of course, feels very strongly that he cannot be replaced and will frequently show examples of poor code written by ChatGPT, giving detailed critiques and arguments for his superiority. However, whilst Mark is adamant Artificial Intelligence cannot take his job, he does believe it can take ours.
Ranking 3 AI models’ attempts to take our jobs
One afternoon, we tried to get an AI model to write a questionnaire suitable for market research. Curious to know Mark’s opinion on its quality, we passed it over and asked for his view. He took one quick glimpse and after around 20 seconds concluded with “Yup, that looks pretty good”.
So we decided to examine that a little bit…will AI be able to replace professional questionnaire writers? High-quality survey data is hard to get without a high-quality questionnaire design, and empathy, an intuitive human trait, plays a significant role.
A good question writer is able to put themselves in the shoes of the respondent as if questions can’t be answered easily, answers can’t be analysed easily. Would an AI language model be able to do this effectively?
We put this to the test with 3 of the biggest freely available AI language models around today and set it a relatively recognisable questionnaire task. The prompt used was:
Write a questionnaire for a 15-minute survey to gather data about people’s understanding of AI. The survey will be conducted online and asked of UK adults, so the questions have to make sense for a British audience. Include several demographic questions. Include instructions for how to answer and if the question is closed, include code frames for answers.
Scroll down for the results, and click here to see the AI questionnaires.
Bing’s AI language model opted for four closed demographic set of questions and six open-ended questions about AI. Three of the four demographic questions had issues, mostly with the options it gave. For example, its gender question didn’t allow the option to self-describe and its occupation question type gave employment status options (e.g., full-time, student, self-employed i.e., not occupations).
As for the main questions, the most striking choice Bing AI made here was making them all open-ended. Open-ended questions can be great as they allow the respondent to write in whatever they want, meaning you can get more depth in your data. However, they are difficult to analyse at scale and the data quality varies wildly depending on how much the respondent wants to write.
As it currently stands, it would be rare for us to write a questionnaire that is mostly open-ended for these reasons, which makes Bing AI’s effort difficult for us to use. However, Bing AI might be working under the assumption that AI language models such as itself will be able to analyse open-end question data at scale very effectively, and there might be a good reason to think this.
The capabilities of AI language models might end up radically changing long-held conventions of questionnaire writing developed within the constraints of what is technologically possible.
Sadly, Bing AI sometimes strayed too close to writing leading questions that naturally suggest a particular answer. For example, it’s hard to see an experienced questionnaire writer including something like “Do you think AI will replace human jobs in the future? If yes, which jobs do you think are most at risk?”.
Critical Verdict: “Not great, but ok”
There’s a lot to like about ChatGPT’s efforts. Firstly, it was the only model of the three to write a survey introduction, a closing message, and answer instructions. The five demographic questions it wrote were broadly appropriate (with only the ethnicity question’s code frames feeling unsatisfactory, though this is arguably a wider issue in the industry).
ChatGPT chose to write nine questions about AI, with only one open-ended. The questions were mostly ok, though they needed some changes in wording to make them easier to answer.
For example, the word ‘define’ in ‘Q1) How would you define artificial intelligence (AI)?’ feels difficult, and we would suggest ‘describe’ as an alternative.
Questions such as ‘Q5) Have you personally interacted with AI-powered technologies or applications in your daily life?’ and ‘Q7) In your opinion, what areas of society would benefit most from the advancement of AI?’ did not include ‘don’t know’ options when it feels plausible that some people might not be sure.
Q7) of course, also requires a ‘none of the above’ option to counteract the implicit assumption that there are areas of society which would benefit.
However, the one open-ended question it asks is an absolute cracker: ‘Q9) How do you think AI should be regulated and governed?’. Good luck to the respondents!
Critical Verdict: “Looks vaguely workable”
In short, we absolutely do not recommend using Google Bard to write a questionnaire for an online survey.
After three unsatisfactory demographic questions (just ‘male’ and ‘female’ options for gender), Google Bard chose to ask eight different open-ended questions that were totally unsuitable for short online survey questions and much more at home as essay questions for university exams.
Questions such as ‘What is AI?’ and ‘How do you think AI will impact society in the future?’ feel like they could be followed by ‘maximum of 3000 words’.
As we say, a good question writer is able to put themselves in the shoes of the respondent, but it seems Google Bard is perfectly happy to force the average person to confront some of society’s biggest and hardest-to-answer questions in the space of a 15-minute online survey.
Critical Verdict: “Bin”
So after all that, can we say that one of these AI models can successfully write a survey template suitable for a genuine market research study?
Of the three tested on this prompt, ChatGPT clearly performed the best, but it still needed some work to get it up to a standard and ready to go into the field.
The issues with ChatGPT’s questionnaire are also much more subtle than Bing’s or Google’s and might only be spotted by more experienced writers. This echoes a lot of what Mark tends to find with code written by ChatGPT. It looks alright but carries mistakes that are likely only going to be spotted and fixed by an experienced human reviewer.
So yes, AI saves time and gives an impression of doing a good job, but we’re still at a point where you run the risk of leaving significant errors in your work without human expertise.
In short, if you would like some well-designed research, it is still absolutely vital you involve some proper research designers!
The questionnaire we asked the AI models to write was a simple one aimed at a broad population. Were we to increase the complexity and specificity of the topic or audience, it feels like the issues we found here would be exacerbated. Frankly, every brand, business, charity, and organisation of any time has its own nuances within its products and audiences and the research methods they use will benefit from deeper contextual understanding and experience.
AI may develop this over time, but there is currently still some way to go.
To speak to a ‘proper research designer’ contact Billy.