José Hernández-Orallo

Professor, Valencian Research Institute for Artificial Intelligence
.....................Valencian Graduate School and Research Network of AI
.....................Universitat Politècnica de València
Phone: +34963877007 (Ext:73585), Office: DSIC (1F): 236
Address: Camí de Vera 14, E-46022 València, EU. Email:
Senior Research Fellow, Leverhulme Centre for the Future of Intelligence
Research Affiliate, Centre for the Study of Existential Risk
Address: 16 Mill Lane, Cambridge, UK. Email:
Fellow, European Association for AI


  • Based in Europe. Intermittently between Valencia and Cambridge. In China in July 2024.
  • For strategy matters, you can also contact my team's executive assistant, Joe Castellano.
  • Be informed about everything (important) on AI Evaluation: join our AI Evaluation Digest!
  • Still under the effects of my book on the Evaluation of Natural and Artificial Intelligence, Cambridge University Press 2017, Prose Award 2018 presented by the Association of American Publishers.
  • Working on and having fun with this project "Robust Evaluation of Cognitive Capabilities and Generality in Artificial Intelligence (ReCOG-AI)", co-led with Lucy Cheke at CFI, funded by DARPA.
  • Coalescing efforts after this workshop and the related initiative on "Predictable AI" on March 8th, 2023, supported by the FLI, as a member of their AI x-safety community See related paper "Predictable Artificial Intelligence".
  • Still exploring many ideas in this other project "Paradigms of Artificial General Intelligence and Their Associated Risks", co-led with Seán Ó hÉigeartaigh at CSER, funded by Future of Life's AGI safety grants.
  • Working on and enjoying this project "MT4XAI: Machine Teaching for Explainable AI", Norwegian Research Council, with J.A. Telle, C. Ferri and P. Parviainen.
  • Working with the OECD on their "AI and the Future of Skills" project and the European Commission (JRC) on the characterisation of Foundation/Frontier Models and GPAI.
  • Recent Highlights

  • Visiting China in July 2024: Prof Fang Luo at Beijing Normal University in the context of SICSS2024 and ICCPAE2024 and Xing Xie at Microsoft Research Asia in the context of an Accelerating Foundation Models Research Programme project.
  • 50th anniversary of the European Conference of Artificial Intelligence (ECAI): accepted papers (main track): B. Mehrbakhsh, F. Martinez-Plumed, J. Hernandez-Orallo "Distilling the Effects of Language Model Contamination" and Y. Moros-Daval, F. Martinez-Plumed, J. Hernandez-Orallo "Language Task Difficulty Prediction through LLM-Annotated Meta-Features". I'll be giving the invited Frontiers in AI Talk: "Caveats and Solutions for Characterising General-Purpose AI"
  • Joined the NIST AISI Consortium through VRAIN.
  • A few recent publications: B. Mehrbakhsh, D. Garigliotti, F. Martinez-Plumed, J. Hernandez-Orallo "Confounders in Instance Variation for the Analysis of Data Contamination", CONDA 2024, R. Fabra-Boluda, C. Ferri, J. Hernandez-Orallo, M.J. Ramirez-Quintana, F. Martínez-Plumed Cracking black-box models: Revealing hidden machine learning techniques behind their predictions, Intelligent Data Analysis. Y. Zhao, J. Hernandez-Orallo The impact of sociality regimes on heterogeneous cooperative-competitive multi-agent reinforcement learning: a study with the predator-prey game", Journal of Experimental & Theoretical Artificial Intelligence.
  • Massive challenges, massive paper: "Foundational Challenges in Assuring Alignment and Safety of Large Language Models".
  • Steering committe for AISafety Workshop 2024 @ IJCAI2024, following the previous AISafety Workshops 2019, 2020, 2021, 2022 and 2023.
  • Talk about "Predictable AI" at the IX Imperial College AI initiative, 30 January 2024.
  • Involved in two tutorials: AAAI2024 on "Measurement Layouts for Capability-oriented AI Evaluation" and EACL2024 "Item Response Theory for Natural Language Processing".
  • Area chair for ECAI2024 and ECML2024. Action Editor for Machine Learning Journal. Consider submitting your best research to any of these venues!
  • Paper "Team Formation through an Assessor: Choosing MARL Agents in Pursuit-Evasion Games" at CAIS 2024.
  • I'm talking too much apparently about Predictable AI, Measurement Layouts, AI Validity and AI Evaluation: Cambridge Psychometrics Centre, CSIC's "The many challenges of artificial intelligence", Workshop on Verifiable and Robust AI, UNESCO IFAP "Artificial Intelligence for Information Accessibility" (AI4IA) Conference, Alan Turing Institute workshop on Evaluating Foundation/Frontier Models.
  • Our RECoG-AI project and our work on AI evaluation featured by Nature ("A test of artificial intelligence"), as part of this Nature Outlook on AI.
  • New preprints (warning: not all peer-reviewed yet!): "Revealing the structure of language model capabilities", "Evaluating General-Purpose AI with Psychometrics", "An International Consortium for Evaluations of Societal-Scale Risks from Advanced AI" (short version accepted for NeurIPS SOLAR), "Predictable Artificial Intelligence".
  • Honoured to give the 2023 UPV Inaugural Lecture with title "Artificial and natural intelligence: from diversity to generality": (Slides (in English), Text (in Valencian) and Slides (in English)).
  • "Rethink reporting of evaluation results in AI: Aggregate metrics and lack of access to results limit understanding" published in Science, 2023. Preprint here.
  • "Your Prompt is My Command : On Assessing the Human-Centred Generality of Multimodal Models" published in Journal of Artificial Intelligence Research, 2023.
  • Gave a talk at the Bell Labs, in Cambridge titled "Capability-oriented AI Evaluation: From Measurement Layouts to Validity Predictors". July 2023
  • "Trends in AI inference energy consumption: Beyond the performance-vs-parameter laws of deep learning" published in Sustainable Computing: Informatics and Systems, 2023.
  • Many many interviews like "this one on El Pais", because I signed a few letters (this" and "this").
  • "Heuristic search of optimal machine teaching curricula" published in Machine Learning, 2023.
  • Papers accepted at ECAI2023, B Mehrbakhsh, F Martinez-Plumed, J Hernandez-Orallo "Adversarial Benchmark Evaluation Rectified by Controlling for Difficulty" and ECML2023, BAT Havardstun, C Ferri, J Hernandez-Orallo, P Parviainen, JA Telle "XAI with Machine Teaching When Humans Are (Not) Informed About the Irrelevant Features".
  • Gave a talk at the Responsible Artificial Intelligence in the age of big models: Understanding and Evaluating Big Models for Human Intelligence and Learning. Microsoft Research, April 2023
  • ChatGPT, PISA and the Future of Education, OECD WIPS Conference, 28 March 2023.
  • Our work redteaming GPT-4 covered by the Financial Times.
  • One of the zillion co-authors of BigBench "Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models". But we do not measure capabilities (only performance) there despite the title! Now accepted for TLMR 2023
  • Gave a talk at the 18th Annual Conference of the Italian Association of Cognitive Sciences on Dec 16th, Rovereto, Italy.
  • Our RECOG-AI project covered on Communications of the ACM.
  • Gave a talk at the School of Computing Colloquim (University of Leeds) on Dec 2nd, 2022, with the title "Performance and Explainability Are Not Enough: Predicting AI Validity"
  • I gave the talk "Don't Trust Your AI System: Model Its Validity Instead", in the Series of talks on "Trustworthy AI" for the "AI for Good Global Summit", United Nations ITU (International Telecommunication Union). 14 Nov 2022.
  • Steering the SafeAI Workshops, next one at AAAI2023, following the previous editions in 2019, 2020, 2021 and 2022.
  • G. Jaimovitch, C. Ferri, J. H. Orallo, F. M-Plumed, M.J. Ramirez "Can language models automate data wrangling?" has been accepted for publication in the Machine Learning Journal.
  • Keynote speaker: "Instructing prior-aligned machines: programs, examples and prompts", The 2nd International Joint Conference on Learning & Reasoning (IJCLR) Cumberland Lodge, Windsor Great Park, United Kingdom, 28-30 September 2022.
  • Three papers accepted for IJCAI-ECAI2022: "Not a Number: Identifying Instance Features for Capability-Oriented Evaluation" with R Burnell, J Burden, D Rutar, K Voudouris, L Cheke, "Non-Cheating Teaching Revisited: A New Probabilistic Machine Teaching Model" with C Ferri and JA Telle, and "Measuring the occupational impact of AI: tasks, cognitive abilities and AI benchmarks" with S. Tolan, F. M-Plumed, A. Pesole, E. F-Macias and E. Gomez
  • IJCAI2022 Survey Track co-Chair (with Peter Flach).
  • Co-organising the Evaluation Beyond Metrics workshop 2022 @ IJCAI2022.
  • Co-organising the AISafety Workshop 2022 @ IJCAI2022, following the previous AISafety Workshops 2019, 2020 and 2021.
  • Paper accepted for ECML/PKDD2022 "Heterogeneity Breaks the Game: Evaluating Cooperation-Competition with Multisets of Agents" with Y. Zhao
  • Three papers accepted for AAAI2022: "When AI Difficulty is Easy: The Explanatory Power of Predicting IRT Difficulty", with F. M.-Plumed, D. C-Falcon and C. Monserrat, "How General-Purpose Is a Language Model? Usefulness and Safety with Human Prompters in the Wild" with P A M Casares, B. S. Loe, J. Burden and S. O'hEigeartaigh, and a Senior Member Track Paper: "Training on the Test Set: Mapping the System-Problem Space in AI", with W. Schellaert and F. M.-Plumed (Blue Sky Idea Runner-Up Award)
  • Our paper on "Automating Data Science" with T. De Bie, L. De Raedt, H. H. Hoos, P. Smyth and C. K. I. Williams on the cover of the Communications of the ACM!
  • Less Recent Highlights

  • Hernandez-Orallo, J.; Loe, B.S.; Cheke, L.; Martinez-Plumed, F., O h'Eigeartaigh, S. "General Intelligence Disentangled: The Generality of Natural and Artificial Intelligence", Nature Sci Rep 2021.
  • New chapter: "Identifying artificial intelligence capabilities: What and how to test", in "AI and the Future of Skills, Volume 1: Capabilities and Assessments", OECD Publishing, Paris.
  • Co-organising the SafeAI Workshops at AAAI 2022 following the previous editions in 2019, 2020 and 2021.
  • Special Issue Editor on Automating Data Science for the Machine Learning Journal
  • New NeurIPS2021 paper: "Think Big, Teach Small: Do Language Models Distil Occam's Razor?" with G. Jaimovich, D.C. Falco and C. Ferri
  • Chapter "Teaching and Explanation: Aligning Priors between Machines and Humans" with C.Ferri in Muggleton, S. and Chater, N. (Eds.) (2021) Human-Like Machine Intelligence. Oxford University Press, and
  • ECML/PKDD 2021 paper "Optimal Teaching Curricula with Compositional Simplicity Priors" with Manual Garcia-Piqueras.
  • Participated in a panel at the NIST AI Measurement and Evaluation Workshop in June 2021 and the AI Metrology series in September 2021.
  • Co-organised ECML/PKDD Workshop on Automating Data Science 2021.
  • New paper accepted the Journal of Artificial Intelligence Research ("Measuring the occupational impact of AI: tasks, cognitive abilities and AI benchmarks" with S. Tolan, A. Pesole, F. Martinez-Plumed, E. Fernandez-Macias and E. Gomez), 2021.
  • Co-organised AISafety Workshop 2021 @ IJCAI2021, following the previous AISafety Workshops 2019 and 2020.
  • New papers accepted for Artificial Intelligence Journal ("Making sense of sensory input" with Richard Evans et al.), Machine Learning Journal ("AUTOMAT[R]IX: learning simple matrix pipelines"), J. of Intelligent Systems ("Missing the missing values: The ugly duckling of fairness in machine learning"), Nature Mat Intell ("Research Community Dynamics behind Popular AI Benchmarks") (see some coverage here) and Telematics and Informatics ("Futures of artificial intelligence through technology readiness levels") 2021.
  • Invited for the Spanish Senate's Commission on Economic Affairs and Digital Transformation, March 2021.
  • Paper "Negative Side Effects and AI Agent Indicators: Experiments in SafeLife" at SafeAI Workshop at AAAI 2021.
  • Participated in the OECD Expert Meeting on Skills and Tests for Assessing AI and Robotics, with this presentation.
  • New papers accepted for Minds and Machines ("Twenty Years Beyond the Turing Test: Moving Beyond the Human Judges Too") and Expert Systems and Applications ("Learning alternative ways of performing a task").
  • Animal AI Olympics Paper: "The Animal-AI Testbed and Competition", Proceedings of Machine Learning Research, 2020.
  • Co-organised the 1st Workshop on Evaluating Progress in AI (EPAI2020) at ECAI 2020.
  • Four papers accepted for ECAI 2020: "Tracking AI: The Capability is (Not) Near" with F. Martínez-Plumed and E. Gómez, "AI Paradigms and AI Safety: Mapping Artefacts and Techniques to Safety Issues" with F. Martínez-Plumed, Shahar Avin, Jess Whittlestone and Seán O h'Eigeartaigh, "Finite and Confident Teaching in Expectation:Sampling from Infinite Concept Classes" with J.A. Telle and "Family and Prejudice: A Behavioural Taxonomy of Machine Learning Techniques" with the DMIP team.
  • Read our paper: CRISP-DM Twenty Years Later: From Data Mining Processes to Data Science Trajectories, IEEE Transactions on Knowledge and Data Engineering journal, 2020.
  • Read our paper: "Does AI Qualify for the Job? A Bidirectional Model Mapping Labour and AI Intensities", AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society, 2020.
  • Co-organised the Animal-AI Olympics, 2019.
  • Read: Journal of Artificial Intelligence Research: "AI Generality and Spearman's Law of Diminishing Returns", 2019.
  • Contributing to the AI Safety Landscape.
  • Gave a talk at the Cambridge Science Festival, 2019.
  • Measure for measure column: "Unbridled mental power", Nature Physics, vol. 15, 2019.
  • Read our paper Item Response Theory in AI: Analysing Machine Learning Classifiers at the Instance Level", Artificial Intelligence Journal, 2019.

    One of the great scientific challenges of this century is to understand what intelligence is and how it can be recreated. My bit is, on one hand, the evaluation and measurement of intelligent systems in general and machine learning in particular and, on the other hand, some more applied research on data science, data mining and inductive programming. However, I'm interested in many other things, and my publication profiles below can give a better account of what my research really looks like: Here you also have a selection of some tutorials and presentations: Apart from the recent one on the Evaluation of Natural and Artificial Intelligence, I've published several other books on various topics.

    I am collaborating in several national strategies for AI, in the editorial board of the Springer journals Machine Learning and Data Mining and Knowledge Discovery, and have served as area chair or senior PC of IJCAI, AAAI, ECAI, KDD, ECML, NeurIPS and PC member for many others, ICML, CogSci, AGI, ICDM, UAI, ICLR, etc.


    Data Mining, Machine Intelligence and Inductive Programming (DMIP), part of the ELP group. Kinds of Intelligence Programme, at the Leverhulme Centre for the Future of Intelligence.



    We have had projects, collaborations and visits with several companies in different areas: health, retailing, software development, automotive, ...

    Recently, I've been managing two "Cátedras/Aulas de Empresa":


    José Hernández-Orallo is Professor at the Universitat Politècnica de València, Spain and Senior Research Fellow at the Leverhulme Centre for the Future of Intelligence, University of Cambridge, UK. He received a B.Sc. and a M.Sc. in Computer Science from UPV, partly completed at the École Nationale Supérieure de l'Électronique et de ses Applications (France), and a Ph.D. in Logic and Philosophy of Science with a doctoral extraordinary prize from the University of Valencia. His academic and research activities have spanned several areas of artificial intelligence, machine learning, data science and intelligence measurement, with a focus on a more insightful analysis of the capabilities, generality, progress, impact and risks of artificial intelligence. He has published five books and more than two hundred journal articles and conference papers on these topics. His research in the area of machine intelligence evaluation has been covered by several popular outlets, such as The Economist, New Scientist or Nature. He keeps exploring a more integrated view of the evaluation of natural and artificial intelligence, as vindicated in his book "The Measure of All Minds" (Cambridge University Press, 2017, PROSE Award 2018). He is a member of AAAI, CLAIRE and ELLIS, and a EurAI Fellow.

    IN THE MEDIA (and blogs) (not up to date)

    Don't take this too seriously: The anYnt project had an extraordinary (and sometimes hilarious) media coverage.

  • Signatory of the DORA declaration: better ways to evaluate research
  • Member of AI Existential Safety Community
  • I support ARKOSE
  • Signatory of Pause Giant AI Experiments
  • Signatory of AI Risk

  • (Copyleft) José Hernández Orallo, 2024.