Post Content
The old tests are getting too easy. Tejal Patwardhan leads OpenAI’s frontier evals team, which is finding new ways to measure and forecast progress as models become more capable. She and host Andrew Mayne discuss why evals matter for research, how benchmarks can break or get gamed, and what models need to be judged on next.
Chapters
00:00:24 Growing up at OpenAI
00:03:10 Why reasoning changed everything
00:06:28 What made o1 surprising
00:11:20 Why old benchmarks stopped working
00:14:45 What makes a good benchmark
00:17:35 Why evals are getting harder
00:22:09 Measuring voice and vision models
00:24:48 Testing models on real science
00:33:23 How OpenAI tracks frontier progress
00:40:47 What AI means for work Read More OpenAI
#AI #OpenAI