Identifying Generative AI Text Through Key Words: A Guide

July 1, 2024

Researchers have developed a new method to detect the use of Large Language Models (LLMs) in scientific writing by analyzing the frequency of certain “excess words” that started appearing more frequently in abstracts from 2023 and 2024. The study, conducted by researchers from Germany’s University of Tubingen and Northwestern University, aimed to identify changes in vocabulary usage that could be attributed to the widespread use of LLMs.

By comparing the expected frequency of words based on pre-2023 trends to the actual frequency of words in abstracts post-2022, the researchers found a significant increase in the usage of certain words like “delves,” “showcasing,” and “underscores” after the introduction of LLMs. These style words, including verbs, adjectives, and adverbs, showed a sudden surge in scientific writing, indicating potential LLM usage.

The study also revealed that LLM-assisted abstracts were more common in papers authored in countries like China, South Korea, and Taiwan, suggesting that non-native English speakers may be using LLMs to edit their texts. However, native English speakers were found to be better at recognizing and removing unnatural style words generated by LLMs, potentially concealing their use.

The researchers emphasized the importance of detecting LLM use, as these models have been known to produce inaccurate information and false claims that can mislead readers. As awareness of LLMs’ marker words grows, human editors may become more adept at identifying and removing generated text that may contain misleading information.

Looking ahead, future large language models may adapt to reduce the frequency of marker words to make their outputs more human-like and harder to detect. As technology advances, the need to distinguish between generative AI text and human-written content may become increasingly challenging, requiring sophisticated tools to ensure the authenticity of written work. The evolution of language in the digital age continues to shape how we communicate and create content, highlighting the complex interplay between technology and human creativity.