Demo: A real-time 3D digital twin built on Microsoft Fabric

Demo: A real-time 3D digital twin built on Microsoft Fabric

July 13, 2026

Why AI adoption fails (and how to fix it)

Why AI adoption fails (and how to fix it)

July 13, 2026

Quickly Build Personalized Agent Skills

Quickly Build Personalized Agent Skills

July 13, 2026

What’s going wrong with this Kotlin code?

What’s going wrong with this Kotlin code?

July 13, 2026

Why Tejal Patwardhan stopped underestimating the models – Episode 21

June 16, 2026

Why Tejal Patwardhan stopped underestimating the models – Episode 21

1 min read

Post Content

The old tests are getting too easy. Tejal Patwardhan leads OpenAI’s frontier evals team, which is finding new ways to measure and forecast progress as models become more capable. She and host Andrew Mayne discuss why evals matter for research, how benchmarks can break or get gamed, and what models need to be judged on next.

Chapters

00:00:24 Growing up at OpenAI
00:03:10 Why reasoning changed everything
00:06:28 What made o1 surprising
00:11:20 Why old benchmarks stopped working
00:14:45 What makes a good benchmark
00:17:35 Why evals are getting harder
00:22:09 Measuring voice and vision models
00:24:48 Testing models on real science
00:33:23 How OpenAI tracks frontier progress
00:40:47 What AI means for work Read More OpenAI

#AI #OpenAI

You May Also Like

The Human Edge Over Artificial Intelligence

The Human Edge Over Artificial Intelligence

July 13, 2026

How an AI Companion for Emotional Support Helps

How an AI Companion for Emotional Support Helps

July 13, 2026

Artificial (Non) Intelligence – Where Agents Go Wild

Artificial (Non) Intelligence – Where Agents Go Wild

July 13, 2026

More From Author

Demo: A real-time 3D digital twin built on Microsoft Fabric

Demo: A real-time 3D digital twin built on Microsoft Fabric

July 13, 2026

The Human Edge Over Artificial Intelligence

The Human Edge Over Artificial Intelligence

July 13, 2026

How an AI Companion for Emotional Support Helps

How an AI Companion for Emotional Support Helps

July 13, 2026

Why Tejal Patwardhan stopped underestimating the models – Episode 21

June 16, 2026

Why Tejal Patwardhan stopped underestimating the models – Episode 21

1 min read

Post Content

The old tests are getting too easy. Tejal Patwardhan leads OpenAI’s frontier evals team, which is finding new ways to measure and forecast progress as models become more capable. She and host Andrew Mayne discuss why evals matter for research, how benchmarks can break or get gamed, and what models need to be judged on next.

Chapters

00:00:24 Growing up at OpenAI
00:03:10 Why reasoning changed everything
00:06:28 What made o1 surprising
00:11:20 Why old benchmarks stopped working
00:14:45 What makes a good benchmark
00:17:35 Why evals are getting harder
00:22:09 Measuring voice and vision models
00:24:48 Testing models on real science
00:33:23 How OpenAI tracks frontier progress
00:40:47 What AI means for work Read More OpenAI

#AI #OpenAI

You May Also Like

The Human Edge Over Artificial Intelligence

The Human Edge Over Artificial Intelligence

July 13, 2026

How an AI Companion for Emotional Support Helps

How an AI Companion for Emotional Support Helps

July 13, 2026

Artificial (Non) Intelligence – Where Agents Go Wild

Artificial (Non) Intelligence – Where Agents Go Wild

July 13, 2026

More From Author

Demo: A real-time 3D digital twin built on Microsoft Fabric

Demo: A real-time 3D digital twin built on Microsoft Fabric

July 13, 2026

The Human Edge Over Artificial Intelligence

The Human Edge Over Artificial Intelligence

July 13, 2026

How an AI Companion for Emotional Support Helps

How an AI Companion for Emotional Support Helps

July 13, 2026

Support Developer

You May Also Like:

Demo: A real-time 3D digital twin built on Microsoft Fabric

IT

Demo: A real-time 3D digital twin built on Microsoft Fabric

July 13, 2026

The Human Edge Over Artificial Intelligence

AI

The Human Edge Over Artificial Intelligence

July 13, 2026

How an AI Companion for Emotional Support Helps

AI

How an AI Companion for Emotional Support Helps

July 13, 2026

2.3 Seconds from €100k: Reece Wilson’s 2nd Place POV from the Andorra DH World Cup

Bike

2.3 Seconds from €100k: Reece Wilson’s 2nd Place POV from the Andorra DH World Cup

July 13, 2026

Review: Going Off-Menu to Maximize the Stinner Romero MT’s Performance

Bike

Review: Going Off-Menu to Maximize the Stinner Romero MT’s Performance

July 13, 2026

Video: Post-Race Interviews from the 2026 Andorra DH World Cup

Bike

Video: Post-Race Interviews from the 2026 Andorra DH World Cup

July 13, 2026

Video: Hugo Pigeon’s Winning Ride from the 2026 Megavalanche

Bike

Video: Hugo Pigeon’s Winning Ride from the 2026 Megavalanche

July 13, 2026

Forum: Tire Pressures for Groms?

Bike

Forum: Tire Pressures for Groms?

July 13, 2026

You have not selected any currencies to display