José Hernández-Orallo

Professor, Valencian Research Institute for Artificial Intelligence
.....................Valencian Graduate School and Research Network of AI
.....................Universitat Politècnica de València
Phone: +34963877007 (Ext:73585), Office: DSIC (1F): 236
Address: Camí de Vera 14, E-46022 València, EU. Email:
Senior Research Fellow, Leverhulme Centre for the Future of Intelligence
Research Affiliate, Centre for the Study of Existential Risk
Address: 16 Mill Lane, Cambridge, UK. Email:
Fellow, European Association for AI


  • Still under the effects of my book on the Evaluation of Natural and Artificial Intelligence, Cambridge University Press 2017, Prose Award 2018 presented by the Association of American Publishers.
  • Working on and having fun with this project "Robust Evaluation of Cognitive Capabilities and Generality in Artificial Intelligence (ReCOG-AI)", co-led with Lucy Cheke at CFI, funded by DARPA.
  • Coalescing efforts after this workshop and the related initiative on "Predictable AI" on March 8th, 2023, supported by the FLI, as a member of their AI x-safety community
  • Still exploring many ideas in this other project "Paradigms of Artificial General Intelligence and Their Associated Risks", co-led with Seán Ó hÉigeartaigh at CSER, funded by Future of Life's AGI safety grants.
  • Working on and enjoying this project "MT4XAI: Machine Teaching for Explainable AI", Norwegian Research Council, with J.A. Telle, C. Ferri and P. Parviainen.
  • Recent Highlights

  • Our RECoG-AI project and our work on AI evaluation features covered by Nature ("A test of artificial intelligence"), as part of this Nature Outlook on AI.
  • Honoured to give the 2023 UPV Inaugural Lecture with title "Artificial and natural intelligence: from diversity to generality": (Slides (in English), Text (in Valencian) and Slides (in English)).
  • "Rethink reporting of evaluation results in AI: Aggregate metrics and lack of access to results limit understanding" published in Science, 2023. Preprint here.
  • "Your Prompt is My Command : On Assessing the Human-Centred Generality of Multimodal Models" published in Journal of Artificial Intelligence Research, 2023.
  • Gave a talk at the Bell Labs, in Cambridge titled "Capability-oriented AI Evaluation: From Measurement Layouts to Validity Predictors". July 2023
  • "Trends in AI inference energy consumption: Beyond the performance-vs-parameter laws of deep learning" published in Sustainable Computing: Informatics and Systems, 2023.
  • Many many interviews like "this one on El Pais", because I signed a few letters (this" and "this").
  • "Heuristic search of optimal machine teaching curricula" published in Machine Learning, 2023.
  • Gave a talk at the Responsible Artificial Intelligence in the age of big models: Understanding and Evaluating Big Models for Human Intelligence and Learning. Microsoft Research, April 2023
  • Our work redteaming GPT-4 covered by the Financial Times.
  • One of the zillion co-authors of BigBench "Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models". But we do not measure capabilities (only performance) there despite the title! Now accepted for TLMR 2023
  • Gave a talk at the 18th Annual Conference of the Italian Association of Cognitive Sciences on Dec 16th, Rovereto, Italy.
  • Our RECOG-AI project covered on Communications of the ACM.
  • Gave a talk at the School of Computing Colloquim (University of Leeds) on Dec 2nd, 2022, with the title "Performance and Explainability Are Not Enough: Predicting AI Validity"
  • I gave the talk "Don't Trust Your AI System: Model Its Validity Instead", in the Series of talks on "Trustworthy AI" for the "AI for Good Global Summit", United Nations ITU (International Telecommunication Union). 14 Nov 2022.
  • Steering the SafeAI Workshops, next one at AAAI2023, following the previous editions in 2019, 2020, 2021 and 2022.
  • G. Jaimovitch, J. H. Orallo, M.J. Ramirez, C. Ferri "Can language models automate data wrangling?" has been accepted for publication in the Machine Learning Journal.
  • Keynote speaker: "Instructing prior-aligned machines: programs, examples and prompts", The 2nd International Joint Conference on Learning & Reasoning (IJCLR) Cumberland Lodge, Windsor Great Park, United Kingdom, 28-30 September 2022.
  • Three papers accepted for IJCAI-ECAI2022: "Not a Number: Identifying Instance Features for Capability-Oriented Evaluation" with R Burnell, J Burden, D Rutar, K Voudouris, L Cheke, "Non-Cheating Teaching Revisited: A New Probabilistic Machine Teaching Model" with C Ferri and JA Telle, and "Measuring the occupational impact of AI: tasks, cognitive abilities and AI benchmarks" with S. Tolan, F. M-Plumed, A. Pesole, E. F-Macias and E. Gomez
  • IJCAI2022 Survey Track co-Chair (with Peter Flach).
  • Co-organising the Evaluation Beyond Metrics workshop 2022 @ IJCAI2022.
  • Co-organising the AISafety Workshop 2022 @ IJCAI2022, following the previous AISafety Workshops 2019, 2020 and 2021.
  • Paper accepted for EMCL/PKDD2022 "Heterogeneity Breaks the Game: Evaluating Cooperation-Competition with Multisets of Agents" with Y. Zhao
  • Three papers accepted for AAAI2022: "When AI Difficulty is Easy: The Explanatory Power of Predicting IRT Difficulty", with F. M.-Plumed, D. C-Falcon and C. Monserrat, "How General-Purpose Is a Language Model? Usefulness and Safety with Human Prompters in the Wild" with P A M Casares, B. S. Loe, J. Burden and S. O'hEigeartaigh, and a Senior Member Track Paper: "Training on the Test Set: Mapping the System-Problem Space in AI", with W. Schellaert and F. M.-Plumed (Blue Sky Idea Runner-Up Award)
  • Our paper on "Automating Data Science" with T. De Bie, L. De Raedt, H. H. Hoos, P. Smyth and C. K. I. Williams on the cover of the Communications of the ACM!
  • Less Recent Highlights

  • Hernandez-Orallo, J.; Loe, B.S.; Cheke, L.; Martinez-Plumed, F., O h'Eigeartaigh, S. "General Intelligence Disentangled: The Generality of Natural and Artificial Intelligence", Nature Sci Rep 2021.
  • New chapter: "Identifying artificial intelligence capabilities: What and how to test", in "AI and the Future of Skills, Volume 1: Capabilities and Assessments", OECD Publishing, Paris.
  • Co-organising the SafeAI Workshops at AAAI 2022 following the previous editions in 2019, 2020 and 2021.
  • Special Issue Editor on Automating Data Science for the Machine Learning Journal
  • New NeurIPS2021 paper: "Think Big, Teach Small: Do Language Models Distil Occam's Razor?" with G. Jaimovich, D.C. Falco and C. Ferri
  • Chapter "Teaching and Explanation: Aligning Priors between Machines and Humans" with C.Ferri in Muggleton, S. and Chater, N. (Eds.) (2021) Human-Like Machine Intelligence. Oxford University Press, and
  • ECML/PKDD 2021 paper "Optimal Teaching Curricula with Compositional Simplicity Priors" with Manual Garcia-Piqueras.
  • Participated in a panel at the NIST AI Measurement and Evaluation Workshop in June 2021 and the AI Metrology series in September 2021.
  • Co-organised ECML/PKDD Workshop on Automating Data Science 2021.
  • New paper accepted the Journal of Artificial Intelligence Research ("Measuring the occupational impact of AI: tasks, cognitive abilities and AI benchmarks" with S. Tolan, A. Pesole, F. Martinez-Plumed, E. Fernandez-Macias and E. Gomez), 2021.
  • Co-organised AISafety Workshop 2021 @ IJCAI2021, following the previous AISafety Workshops 2019 and 2020.
  • New papers accepted for Artificial Intelligence Journal ("Making sense of sensory input" with Richard Evans et al.), Machine Learning Journal ("AUTOMAT[R]IX: learning simple matrix pipelines"), J. of Intelligent Systems ("Missing the missing values: The ugly duckling of fairness in machine learning"), Nature Mat Intell ("Research Community Dynamics behind Popular AI Benchmarks") (see some coverage here) and Telematics and Informatics ("Futures of artificial intelligence through technology readiness levels") 2021.
  • Invited for the Spanish Senate's Commission on Economic Affairs and Digital Transformation, March 2021.
  • Paper "Negative Side Effects and AI Agent Indicators: Experiments in SafeLife" at SafeAI Workshop at AAAI 2021.
  • Participated in the OECD Expert Meeting on Skills and Tests for Assessing AI and Robotics, with this presentation.
  • New papers accepted for Minds and Machines ("Twenty Years Beyond the Turing Test: Moving Beyond the Human Judges Too") and Expert Systems and Applications ("Learning alternative ways of performing a task").
  • Animal AI Olympics Paper: "The Animal-AI Testbed and Competition", Proceedings of Machine Learning Research, 2020.
  • Co-organised the 1st Workshop on Evaluating Progress in AI (EPAI2020) at ECAI 2020.
  • Four papers accepted for ECAI 2020: "Tracking AI: The Capability is (Not) Near" with F. Martínez-Plumed and E. Gómez, "AI Paradigms and AI Safety: Mapping Artefacts and Techniques to Safety Issues" with F. Martínez-Plumed, Shahar Avin, Jess Whittlestone and Seán O h'Eigeartaigh, "Finite and Confident Teaching in Expectation:Sampling from Infinite Concept Classes" with J.A. Telle and "Family and Prejudice: A Behavioural Taxonomy of Machine Learning Techniques" with the DMIP team.
  • Read our paper: CRISP-DM Twenty Years Later: From Data Mining Processes to Data Science Trajectories, IEEE Transactions on Knowledge and Data Engineering journal, 2020.
  • Read our paper: "Does AI Qualify for the Job? A Bidirectional Model Mapping Labour and AI Intensities", AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society, 2020.
  • Co-organised the Animal-AI Olympics, 2019.
  • Read: Journal of Artificial Intelligence Research: "AI Generality and Spearman's Law of Diminishing Returns", 2019.
  • Contributing to the AI Safety Landscape.
  • Gave a talk at the Cambridge Science Festival, 2019.
  • Measure for measure column: "Unbridled mental power", Nature Physics, vol. 15, 2019.
  • Read our paper Item Response Theory in AI: Analysing Machine Learning Classifiers at the Instance Level", Artificial Intelligence Journal, 2019.

    One of the great scientific challenges of this century is to understand what intelligence is and how it can be recreated. My bit is, on one hand, the evaluation and measurement of intelligent systems in general and machine learning in particular and, on the other hand, some more applied research on data science, data mining and inductive programming. However, I'm interested in many other things, and my publication profiles below can give a better account of what my research really looks like: Here you also have a selection of some recent (or future) tutorials and presentations: Apart from the recent one on the Evaluation of Natural and Artificial Intelligence, I've published several other books on various topics.

    I am collaborating in several national strategies for AI, in the editorial board of the Springer journals Machine Learning and Data Mining and Knowledge Discovery, and have served as area chair or senior PC of IJCAI, AAAI, ECAI, KDD, ECML, NeurIPS and PC member for many others, ICML, CogSci, AGI, ICDM, UAI, ICLR, etc.


    Data Mining, Machine Intelligence and Inductive Programming (DMIP), part of the ELP group. Kinds of Intelligence Programme, at the Leverhulme Centre for the Future of Intelligence.



    We have had projects, collaborations and visits with several companies in different areas: health, retailing, software development, automotive, ...

    Recently, I've been managing two "Cátedras/Aulas de Empresa":


    José Hernández-Orallo is Professor at the Universitat Politècnica de València, Spain and Senior Research Fellow at the Leverhulme Centre for the Future of Intelligence, University of Cambridge, UK. He received a B.Sc. and a M.Sc. in Computer Science from UPV, partly completed at the École Nationale Supérieure de l'Électronique et de ses Applications (France), and a Ph.D. in Logic and Philosophy of Science with a doctoral extraordinary prize from the University of Valencia. His academic and research activities have spanned several areas of artificial intelligence, machine learning, data science and intelligence measurement, with a focus on a more insightful analysis of the capabilities, generality, progress, impact and risks of artificial intelligence. He has published five books and more than two hundred journal articles and conference papers on these topics. His research in the area of machine intelligence evaluation has been covered by several popular outlets, such as The Economist, New Scientist or Nature. He keeps exploring a more integrated view of the evaluation of natural and artificial intelligence, as vindicated in his book "The Measure of All Minds" (Cambridge University Press, 2017, PROSE Award 2018). He is a member of AAAI, CLAIRE and ELLIS, and a EurAI Fellow.

    IN THE MEDIA (and blogs)

    Don't take this too seriously: The anYnt project had an extraordinary (and sometimes hilarious) media coverage.


    (Copyleft) José Hernández Orallo, 2023.