Dr Ryan BurnellLeverhulme Centre for the Future of Intelligence, University of Cambridge, UK.
Prof. Anthony CohnSchool of Computing, University of Leeds and the Turing Institute, UK
- 18 May 2023
- Salón de Actos - ETSINF, UPV
Evaluating General-Purpose AI
Simposio (en inglés) sobre “Evaluación de IA de propósito general” con ponentes expertos:
- Profesor Anthony Cohn de la Escuela de Informática de la Universidad de Leeds y el Instituto Turing, Reino Unido
- Título: “Dialectical Evaluation of Spatial Reasoning in Large Language Models”
- Abstract: Many claims have been made about the abilities of Large Language models (LLMs). Here we investigate, through the use of a dialectical approach, to what extent they are able to reason correctly spatial information, in particular about commonsense spatial situations and about geo-spatial texts. Thus, rather than evaluating an LLM on a set of benchmark data and reporting aggregate statistics, we conduct a series of dialogues with such a system in order to find failures in reasoning and thus map the boundaries of the system. We conclude with some suggestions for future work both to improve the capabilities of language models and to systematise this kind of dialectical evaluation.
- Dr. Ryan Burnell del Centro Leverhulme para el Futuro de la Inteligencia de la Universidad de Cambridge, Reino Unido.
- Title: “A cognitive approach to the evaluation of AI systems”
- Abstract: The capabilities of AI systems are improving rapidly, and these systems are being deployed in increasingly complex and high-stakes contexts. As the importance of AI grows, so too does the need for robust evaluation. If we want to determine the extent to which systems are safe, effective, and unbiased, it is vital that we understand the cognitive capabilities of those systems. In this endeavour, psychological science has a lot to offer —scientists from cognitive, developmental, and comparative psychology have spent many decades developing theories and paradigms to understand the cognitive capabilities of adults, children, and animals. Drawing on these theories and paradigms, we are working to build a framework for evaluating the cognitive capabilities of AI systems that we hope can be used to better track and regulate AI progress. I will present an initial version of the framework and discuss the open questions and challenges of applying cognitive science to AI evaluation.
Obtén valiosa información sobre las últimas investigaciones sobre la evaluación de la eficacia y las limitaciones de la IA de propósito general. No te pierdas esta oportunidad de aprender de expertos en el campo y participar en la discusión.