Google’s Gemini models, 1.5 Pro and 1.5 Flash, have been praised for their data analysis abilities, with claims that they can process and analyze large amounts of data to accomplish complex tasks. However, recent research has shown that these models may not be as effective as initially thought.
Studies conducted by researchers at UMass Amherst, the Allen Institute for AI, Princeton, and UC Santa Barbara have revealed that Gemini 1.5 Pro and 1.5 Flash struggle to accurately answer questions about extensive datasets. In fact, in document-based tests, the models provided correct answers only 40-50% of the time.
While Gemini models have the capability to process long contexts, researchers have observed that the models may not truly understand the content they are analyzing. For instance, when asked to evaluate true/false statements about fiction books, Gemini 1.5 Pro and 1.5 Flash struggled to provide accurate answers, with success rates as low as 20%.
Additionally, when tested on their ability to reason over videos, Gemini 1.5 Flash did not perform well, particularly when asked to transcribe handwritten digits from images. These findings raise questions about the effectiveness of Google’s Gemini models in handling complex tasks that require in-depth analysis and comprehension.
The research also highlights concerns about the overpromising of generative AI technology, with businesses and investors expressing skepticism about the productivity gains and potential risks associated with these tools. As a result, interest in generative AI dealmaking has declined, indicating a growing wariness towards the technology’s limitations.
Moving forward, researchers suggest the need for better benchmarks and third-party critique to provide a more accurate assessment of generative AI models’ capabilities. By addressing these challenges, the industry can work towards developing more reliable and efficient AI systems that meet the expectations of users and businesses alike.