Nearly two dozen researchers from Tsinghua University, Ohio State University and the University of California at Berkeley collaborated to create a method for measuring the capabilities of large language models (LLMs) as real-world agents.
LLMs such as OpenAI’s ChatGPT and Anthropic’s Claude have taken the technology world by storm over the past year, as cutting-edge “chatbots” have proven useful at a variety of tasks, including coding, cryptocurrency trading and text generation.