news-05112024-173248

AI benchmarking has always been a challenge. Traditional benchmarks often involve questions that can be answered through memorization or cover topics that may not be relevant to real-world applications. To address this issue, some AI enthusiasts have turned to games as a way to test the problem-solving abilities of AI models.

One such example is Paul Calcraft, a freelance AI developer, who created an app where two AI models can play a Pictionary-like game. In this game, one model doodles while the other model tries to guess what the doodle represents. Calcraft was inspired by a project by British programmer Simon Willison, who tasked models with rendering a vector drawing of a pelican riding a bicycle. The goal of these projects is to create benchmarks that cannot be easily gamed by memorization or simple patterns.

Similarly, 16-year-old Adonis Singh developed Mcbench, a tool that allows AI models to control a Minecraft character and test their ability to design structures. Singh believes that Minecraft provides a unique challenge for AI models, testing their resourcefulness and problem-solving abilities in a less restricted environment compared to traditional benchmarks.

The idea of using games to benchmark AI is not new. Games like chess have long been considered challenging for intelligent software. However, what sets current projects apart is the use of large language models (LLMs) that can analyze text, images, and more. These models have different characteristics and behaviors, making it difficult to quantify their performance.

Matthew Guzdial, an AI researcher at the University of Alberta, highlights the importance of using games to provide a visual and intuitive way to compare how AI models perform. Games offer a different approach to decision-making and problem-solving, allowing researchers to test models in various scenarios.

While some believe that games like Pictionary and Minecraft can provide valuable insights into AI reasoning and problem-solving abilities, others are more skeptical. Mike Cook, a research fellow specializing in AI, argues that games like Minecraft may not be as special as some enthusiasts believe. He points out that while games like Minecraft offer unpredictable challenges, they may not necessarily reflect real-world reasoning or problem-solving.

Despite differing opinions, the use of games to benchmark AI models continues to be an area of interest for researchers and developers. Whether it’s watching LLMs play Pictionary or build castles in Minecraft, these projects offer valuable insights into the capabilities and limitations of AI models in a gaming environment.