Mandy's World
Posts
The Word That Broke AI Research

The Word That Broke AI Research

Or: Why I stopped saying "AGI" and started thinking differently

Mandy Alhorn
December 26, 2025

The Word That Broke AI Research

Or: Why I stopped saying "AGI" and started thinking differently

I've been obsessed with AI for years now. Not in the "I use ChatGPT to write emails" way, but in the "I spend my mornings reading research papers while drinking matcha" way. It's become part of who I am, this constant scanning of the horizon for what's coming next.

But recently, something shifted.

I watched an interview with Ilya Sutskever, the guy who co-founded OpenAI and then left to start his own safety-focused lab. And he said something that made me put down my cup and just... sit there.

He talked about how the word "scaling" didn't just describe what AI researchers were doing. It prescribed what they would think to do next.

That hit different.

The trap I didn't see

Think about it. Around 2020, the entire AI field rallied around this one word: scaling. More data. More compute. Bigger models. The recipe was simple, the results were measurable, and it worked. GPT-3 blew everyone's minds, and suddenly every lab knew exactly what to do.

Just... more.

I remember reading the papers back then, feeling like I was watching history unfold. The scaling laws seemed almost magical. Double the compute, get predictable improvements. It felt like we'd found some fundamental truth about intelligence.

But here's what I missed: the word itself was doing something to how everyone thought. When your entire vocabulary revolves around "scaling," certain questions become invisible.

What if the problem isn't size but structure?

What if humans learn using a completely different algorithm?

What if more data sometimes makes things worse?

These questions don't even occur to you when "scaling" is your north star. The word created a blind spot so big that billions of dollars and thousands of careers flowed in one direction. Not because it was necessarily the best direction, but because it was the only one the vocabulary allowed.

My own scaling problem

I caught myself doing something similar in my own life, actually.

I work as a business analyst at a company that builds software for banks. I'm not a developer, but I work closely with data scientists who build ML models and with the engineering team. Most of my AI obsession is personal interest. I just find this stuff fascinating. But sometimes it pays off at work too, like when we started exploring embeddings for automating financial statement analysis.

Here's the thing though. When I first started diving into AI content, I kept thinking in terms of "more." More newsletters to subscribe to. More papers to read. More courses to finish. The banking mindset I grew up with professionally, that "work harder, accumulate more" approach, had infected how I was learning about a completely different field.

It took me embarrassingly long to realize that understanding doesn't scale linearly with consumption. Sometimes one interview that makes you question your assumptions is worth more than fifty that confirm what you already think. The word "more" was blinding me to "deeper."

Language shapes thinking. Even in the small stuff.

The AGI illusion

Sutskever's second point was about "AGI" itself. And this one really got me.

The term exists as a reaction. Chess AI beat Kasparov, and critics said "sure, but it's so narrow." So researchers countered: fine, we'll build general intelligence that can do everything.

But think about what that framing does.

A 15-year-old human can't do most jobs. They have massive knowledge gaps. They need years of specialized training for any career. We don't call teenagers "narrow intelligence." We call them learners.

The word "AGI" made everyone aim for a finished, omniscient system. When maybe the actual goal should be: a system that can learn anything, even if it starts knowing very little.

I've spent so much time wondering "when will we get AGI" without ever questioning whether that's even the right question. Maybe it should be: when will we build something that learns like a curious teenager? That can pick up any skill given time and practice?

Completely different research direction. Completely different future.

The bug that won't die

Sutskever gave this perfect example that I can't stop thinking about.

You ask an AI to code something. It produces code with a bug. You point it out. The AI goes "oh my god, you're so right!" and fixes it. But introduces a second bug. You point that out. It brings back the first bug. You can bounce between those two bugs forever.

I've experienced this. Multiple times. It's maddening because these models ace coding benchmarks. They can solve competition-level problems. But they can't hold context across two debugging sessions.

Why?

Because we trained them to perform on benchmarks, not to think like thoughtful programmers. We optimized what we could measure. And "maintaining coherent reasoning across iterations" isn't a benchmark.

The word "performance" led us to optimize the wrong thing.

What I'm trying to do differently

I don't have solutions here. I'm not a researcher, just someone trying to make sense of all this.

But I've started paying more attention to my own vocabulary. When I catch myself saying "I need to scale my skills" or "when will AGI arrive" or "what's the performance on this," I try to pause.

Is there a different word that would make me think differently?

Instead of "scaling" my learning, maybe I should think about "deepening" it. Instead of waiting for "AGI," maybe I should be watching for systems that demonstrate genuine learning transfer. Instead of "performance," maybe I should care about robustness.

These aren't just semantic games. The words we choose determine which questions we can even think to ask.

The uncomfortable part

Here's where I get a bit uncomfortable sharing this.

I've been consuming AI content almost daily for years. Newsletters, papers, YouTube breakdowns, Twitter threads. I thought I was staying informed. But Sutskever's point made me realize I might have just been getting better at speaking the existing vocabulary.

The scaling paradigm. The AGI framing. The benchmark obsession.

I internalized it all without ever stepping back to ask: are these the right words?

I'm not sure what the new words should be. Sutskever mentions things like "continuous learning" and "sample efficiency" and "generalization." But these haven't caught on the way "scaling" did. They don't have that same memetic power.

Maybe that's the actual bottleneck. Not the compute. Not the data. But the language.

What I'm watching for now

When I evaluate AI companies or read about new research, I'm trying to listen for different signals.

Are they still talking about "10x our compute"? That's the old paradigm.

Are they talking about how their system learns from fewer examples? How it maintains coherence across long interactions? How it generalizes to things it wasn't explicitly trained on?

That's what I'm curious about now.

Not "when AGI" but "when a system that learns like we do."

It's a subtle shift. But language is subtle. And apparently, it shapes everything.

I'm still figuring this out. If you've noticed other words that might be creating blind spots in how we think about AI, I'd genuinely love to hear about them.

Sources:

Ilya Sutskever – We're moving from the age of scaling to the age of research

“These models somehow just generalize dramatically worse than people. It's a very fundamental thing.”

www.dwarkesh.com/p/ilya-sutskever-2?hide_intro_popup=true

Unpacking Dwarkesh's Ilya Sutskever Interview on AGI, ASI, and How to Build Both Safely

Ilya Sutskever explains why today's AI models ace benchmarks but fail basic tasks, why we're entering a new "age of research" where ideas matter more than scale, and what SSI is actually building—essential viewing for anyone trying to understand where AI capabilities are actually headed.

www.theneuron.ai/explainer-articles/unpacking-dwarkeshs-ilya-sutskever-interview-on-agi-asi-and-how-to-build-both-safely

What I read today:

What piques Interest and Curiosity in AI this Week - Week of December 25, 2025

Special note: Make sure to take a look at semantic leakage in the Gary Marcus article

riorundown.substack.com/p/what-piques-interest-and-curiosity-fbd?utm_source=post-email-title&publication_id=5713721&post_id=182170192&utm_campaign=email-post-title&isFreemail=true&r=bkj0c&triedRedirect=true&utm_medium=email

What I saw today:

What I listened to today:

Stand

Pavel Petrov, Hidden Empire, AKIKI · Stand · Song · 2025

open.spotify.com/intl-de/track/3WQNcYfsOf5nzfF5EdqSYC?si=14a3480b50504287

What I liked today:

— (@)

That’s it for today! ☺️

_Disclaimer:

_{This blog reflects my personal learning journey and experiments with technology. These are my own experiences and observations as I explore the fascinating world of tech and AI.}

_{Developed with research, image generation and writing assistance using AI.}

Reply

or to participate.