A pair of scientists has produced a research paper in less than an hour with the help of ChatGPT — a tool driven by artificial intelligence (AI) that can understand and generate human-like text. The article was fluent, insightful and presented in the expected structure for a scientific paper, but researchers say that there are many hurdles to overcome before the tool can be truly helpful.
The goal was to explore ChatGPT’s capabilities as a research ‘co-pilot’ and spark debate about its advantages and pitfalls, says Roy Kishony, a biologist and data scientist at the Technion — Israel Institute of Technology in Haifa. “We need a discussion on how we can get the benefits with less of the downsides,” he says.
Kishony and his student Tal Ifargan, a data scientist also based at Technion, downloaded a publicly available data set from the US Centers for Disease Control and Prevention’s Behavioral Risk Factor Surveillance System, a database of health-related telephone surveys. The data set includes information collected from more than 250,000 people about their diabetes status, fruit and vegetable consumption, and physical activity.
The building blocks of a paper
The researchers asked ChatGPT to write code they could use to uncover patterns in the data that they could analyse further. On its first attempt, the chatbot generated code that was riddled with errors and didn’t work. But when the scientists relayed the error messages and asked it to correct the mistakes, it eventually produced code that could be used to explore the data set.
With a more-structured data set in hand, Kishony and Ifargan then asked ChatGPT to help them to develop a study goal. The tool suggested they explore how physical activity and diet affect diabetes risk. Once it generated more code, ChatGPT delivered the results: eating more fruit and vegetables and exercising is linked to a lower risk of diabetes. ChatGPT was then prompted to summarize the key findings in a table and write the whole results section. Step by step, they asked ChatGPT to write the abstract, introduction, methods and discussion sections of a manuscript. Finally, they asked ChatGPT to refine the text. “We composed [the paper] from the output of many prompts,” says Kishony. “Every step is building on the products of the previous steps.”
Although ChatGPT generated a clearly written manuscript with solid data analysis, the paper was far from perfect, says Kishony. One problem the researchers encountered was ChatGPT’s tendency to fill in gaps by making things up, a phenomenon known as hallucination. In this case, it generated fake citations and inaccurate information. For instance, the paper states that the study “addresses a gap in the literature” — a phrase that is common in papers but inaccurate in this case, says Tom Hope, a computer scientist at the Hebrew University of Jerusalem. The finding is “not something that’s going to surprise any medical experts”, he says. “It’s not close to being novel.”
Benefits and concerns
Kishony also worries that such tools could make it easier for researchers to engage in dishonest practices such as P-hacking, for which scientists test several hypotheses on a data set, but only report those that produce a significant result.
Another concern is that the ease of producing papers with generative AI tools could result in journals being flooded with low-quality papers, he adds. He says his data-to-paper approach, with human oversight central to every step, could be one way to ensure researchers can easily understand, check and replicate the methods and findings.
Vitomir Kovanović, who develops AI technologies for education at the University of South Australia in Adelaide, says that there needs to be greater visibility of AI tools in research papers. Otherwise, it will be difficult to assess whether a study’s findings are correct, he says. “We will likely need to do more in the future if producing fake papers will be so easy.”
Generative AI tools have the potential to accelerate the research process by carrying out straightforward but time-consuming tasks — such as writing summaries and producing code — says Shantanu Singh, a computational biologist at the Broad Institute of MIT and Harvard in Cambridge, Massachusetts. They might be used for generating papers from data sets or for developing hypotheses, he says. But because hallucinations and biases are difficult for researchers to detect, Singh says, “I don’t think writing entire papers — at least in the foreseeable future — is going to be a particularly good use.”