Skip to content

Towards Responsible Development of Generative AI for Education: An Evaluation-Driven Approach

Towards Responsible Development of Generative AI for Education: An Evaluation-Driven Approach


Irina Jurenka*,‡,1, Markus Kunesch*,†,1, Kevin McKee§,1, Daniel Gillick§,1, Shaojian Zhu†,1, Sara Wiltberger§,1, Shubham Milind Phal1, Katherine Hermann1, Daniel Kasenberg§,1, Avishkar Bhoopchand1, Ankit Anand1, Miruna Pîslar1, Stephanie Chan§,1, Lisa Wang§,1, Jennifer She1, Parsa Mahmoudieh1, Aliya Rysbek1, Wei-Jen Ko3, Andrea Huber1, Brett Wiltshire1, Gal Elidan‡,2, Roni Rabin2, Jasmin Rubinovitz†,4, Amit Pitaru4, Mac McAllister3, Julia Wilkowski3, David Choi8, Roee Engelberg2, Lidan Hackmon2, Adva Levin2, Rachel Griffin5, Michael Sears5, Filip Bar6, Mia Mesar3, Mana Jabbour3, Arslan Chaudhry1, James Cohan3, Sridhar Thiagarajan1, Nir Levine1, Ben Brown1, Dilan Gorur§,1, Svetlana Grant1, Rachel Hashimoshoni3, Laura Weidinger1, Jieru Hu1, Dawn Chen3, Kuba Dolecki3, Canfer Akbulut1, Maxwell Bileschi1, Laura Culp1, Wen-Xin Dong3, Nahema Marchal1, Kelsie Van Deman4, Hema Bajaj Misra3, Michael Duah5, Moran Ambar2, Avi Caciularu2, Sandra Lefdal1, Chris Summerfield7, James An1, Pierre-Alexandre Kamienny1, Abhinit Mohdi3, Theofilos Strinopoulous3, Annie Hale5, Wayne Anderson5, Luis C. Cobo1, Niv Efron†,2, Muktha Ananda3, Shakir Mohamed1, Maureen Heymans3, Zoubin Ghahramani1, Yossi Matias2, Ben Gomes3 and Lila Ibrahim1 *Equal contributions, †Technical lead, ‡Research lead, §Workstream lead, 1Google DeepMind, 2Google Research, 3Google, 4Google Creative Lab, 5Arizona State University, 6Lund University, 7University of Oxford, 8 Anthropic, work carried out while employed at Google DeepMind

A major challenge facing the world is the provision of equitable and universal access to quality education. Recent advances in generative AI (genAI) have created excitement about the potential of new technologies to offer a personal tutor for every learner and a teaching assistant for every teacher. The full extent of this dream, however, has not yet materialised. We argue that this is primarily due to the difficulties with verbalising pedagogical intuitions into gen AI prompts and the lack of good evaluation practices, reinforced by the challenges in defining excellent pedagogy. Here we present our work collaborating with learners and educators to translate high level principles from learning science into a pragmatic set of seven diverse educational benchmarks, spanning quantitative, qualitative, automatic and human evaluations; and to develop a new set of fine-tuning datasets to improve the pedagogical capabilities of Gemini, introducing LearnLM-Tutor. Our evaluations show that LearnLM-Tutor is consistently preferred over a prompt tuned Gemini by educators and learners on a number of pedagogical dimensions. We hope that this work can serve as a first step towards developing a comprehensive educational evaluation framework, and that this can enable rapid progress within the AI and EdTech communities towards maximising the positive impact of gen AI in education.


Posted on: June 9, 2024, 6:48 am Category: Uncategorized

0 Responses

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.

Some HTML is OK


(required, but never shared)

or, reply to this post via trackback.