Setting the right expectations around AI for code generation and getting past the inherent limits

Generative AI is Only One Form of Artificial Intelligence

There’s a lot of buzz around Artificial Intelligence (AI) these days, and it feels like every conversation about technology eventually lands on this topic. However, the term “AI” is often automatically conflated with “Generative AI” (GenAI). It’s important to make a quick distinction. AI is the broad field of making machines smart. Machine Learning (ML) is a subset of AI that uses data to train models, and Deep Learning (DL) is a further subset of ML using neural networks. GenAI, which creates new content, is the latest evolution that has captured everyone’s imagination.

Why I Use AI

AI is Practical

Large Language Models (LLMs) have provided us with an unprecedented natural language interface. This makes them incredibly powerful for tasks like web search and information retrieval, where a human can quickly verify the results. Coding is a domain where outputs are verifiable, which makes AI agents a significant efficiency booster. They aren’t here to replace us, but to augment our abilities.

It’s crucial to remember that LLMs have zero semantic understanding. They are masters of syntax, predicting the next word (or token) in a sequence through an auto-regressive process. They don’t “understand” code; they predict what code should look like.

AI Adds Value to My Work

From my experience, AI brings tangible benefits to my development workflow. Here’s how:

Understanding Codebases: When jumping into a new or complex project, I use AI to get a quick summary of how different parts of the code interact.Project Planning: Through agent orchestration, AI can help outline project structures, suggest potential components, and map out dependencies.Code Documentation: This is a huge one. AI is exceptionally good at generating boilerplate documentation, docstrings, and comments, which frees me up to focus on the more complex descriptions.Enforcing Guidelines: We can use AI to enforce Clean Core principles and other coding guidelines, ensuring that our code remains compliant and release-ready.

My History with AI

My journey with AI didn’t start with the recent hype. It goes back to my college days, long before I entered the SAP ecosystem. My master’s thesis was on multi-layer perceptrons and scaled conjugate gradient descent methods. Ever since then, I’ve been independently researching and following the advancements in this fascinating field.

Potential Pitfalls with AI and Take-aways

Differences Between Prompts and Target Language

Natural language is inherently ambiguous, imprecise, and often not concise. Science, and by extension software development, is about making formal, unambiguous statements. The greatest leaps in scientific history, like the formalization of algebra by Leibniz and Newton, came from moving away from rhetorical natural language and towards formal paradigms. The programming languages we use today are a natural evolution of this.

So-called “vibe coding,” where one tries to build complex systems using only conversational language, is great for keynotes and simple examples but falls apart for enterprise-grade systems. I’ve seen attempts to “formalize” prompts to make agents more reliable, but this is a self-defeating exercise. You end up re-inventing a formal programming language, the very thing you were trying to avoid.

Take-away: This is a fundamental problem that won’t be solved by simply scaling models. My practical advice is to iteratively update your agent’s prompts and instructions as you code. Let the agent update its own instructions when it runs into issues. And most importantly: never commit code without verifying it yourself. Human experience is still an absolute necessity.

Predicting Syntax vs. Understanding Semantics

As mentioned, all modern coding agents are based on a transformer architecture. They predict syntax, token by token, without any semantic intelligence or understanding of programming metaconcepts. They operate on a vast joint probability distribution and can generalize well within that distribution.

However, when faced with an out-of-distribution (OOD) prompt, they “hallucinate.” While syntax prediction without semantic understanding is often “good enough” to boost the efficiency of an experienced software engineer, it’s irresponsible to leave critical design decisions to these agents without human oversight.

Take-away: Until a new paradigm emerges that gives AI true semantic understanding, human oversight is paramount. Used responsibly, AI is an incredibly rewarding tool that greatly supports our daily work.

Hallucinations (First and Higher Order)

OOD prompts lead to what I call “first-order hallucinations,” where the LLM itself can’t generalize and essentially makes things up. The answers are unreliable and require verification. Methods to minimize this include training on more data (scaling) or grounding the LLM with symbolic systems like databases or knowledge graphs.

This leads to what I term “higher-order hallucinations.” The grounded data source might be perfectly accurate. However, to access it, the agent must translate your natural language prompt into a formal query. If it hallucinates during this translation, it can create an invalid query (getting no data) or a faulty query that retrieves the wrong correct data. The source is clean, but the retrieval is flawed.

Take-away: In the absence of a new paradigm, it’s best to use simpler query sequences instead of one complex one. Simplify the toolset available to your agents and bundle related tasks into a single tool to minimize these translation errors.

Non-Determinism in Answers

While the forward-pass of a transformer is deterministic, the token decoding phase is not. This means if you prompt an LLM five times, you might get five different answers. For coding, this is problematic. Even with the temperature parameter set to zero, non-determinism persists.

Take-away: You never know if you’ve received the “best” possible generated code, and neither does the AI. This makes human verification absolutely critical, always. For mass refactoring, a robust review process is essential. Interestingly, review agents can help here, but the final decision must rest with a human.

Explainability & Verifiability

LLMs are notoriously black boxes. Billions of floating-point numbers don’t offer a semantic explanation for why a particular answer was generated. Research into making these models more transparent is still in its infancy. The lack of semantic understanding, paired with non-determinism, makes true explainability incredibly difficult.

Take-away: For now, the only option for users is to wait for new research and paradigms to emerge.

TL;DR

Human oversight is critical until AI gains a true semantic understanding, which it currently lacks.Use generative AI to speed your coding, but…Never commit code without verifying it yourself.Use simpler query sequences instead of one complex one.Have a robust code-review process with humans making the final decision.

Next Steps

It is irresponsible for any serious software engineer to ignore the current suite of AI tools, despite their shortcomings. Demonizing them is counter-productive. For those who know what they’re doing, these tools are massive efficiency boosters. With responsible use and human oversight, they can be successfully integrated into enterprise workflows and offer many opportunities to improve the human-machine interface.

Outlook

We should not believe all the hype or inflate our expectations of what AI can realistically do today. At the same time, we shouldn’t refrain from engaging with these tools in a meaningful way. We, the developers, are still the ones with the creativity, innovation, and semantic understanding. Let’s use these tools to amplify that.

Happy coding!

What are your thoughts? Share your experiences with AI in the comments below!

With over two decades of working on AI, starting with my master’s thesis on multi-layer perceptrons, I’m excited about the recent innovations in this area, especially as related to code generation. In this blog post, I want to share my perspective on using AI in a very specific context: coding with AI agents based on transformer models, as these are the primary tools we have at our disposal for this purpose today. I’ll start with why I use AI and then share some limitations of AI models and how to get around these pitfalls. Let’s start with a quick definition…Generative AI is Only One Form of Artificial IntelligenceThere’s a lot of buzz around Artificial Intelligence (AI) these days, and it feels like every conversation about technology eventually lands on this topic. However, the term “AI” is often automatically conflated with “Generative AI” (GenAI). It’s important to make a quick distinction. AI is the broad field of making machines smart. Machine Learning (ML) is a subset of AI that uses data to train models, and Deep Learning (DL) is a further subset of ML using neural networks. GenAI, which creates new content, is the latest evolution that has captured everyone’s imagination.Why I Use AIAI is PracticalLarge Language Models (LLMs) have provided us with an unprecedented natural language interface. This makes them incredibly powerful for tasks like web search and information retrieval, where a human can quickly verify the results. Coding is a domain where outputs are verifiable, which makes AI agents a significant efficiency booster. They aren’t here to replace us, but to augment our abilities.It’s crucial to remember that LLMs have zero semantic understanding. They are masters of syntax, predicting the next word (or token) in a sequence through an auto-regressive process. They don’t “understand” code; they predict what code should look like.AI Adds Value to My WorkFrom my experience, AI brings tangible benefits to my development workflow. Here’s how:Understanding Codebases: When jumping into a new or complex project, I use AI to get a quick summary of how different parts of the code interact.Project Planning: Through agent orchestration, AI can help outline project structures, suggest potential components, and map out dependencies.Code Documentation: This is a huge one. AI is exceptionally good at generating boilerplate documentation, docstrings, and comments, which frees me up to focus on the more complex descriptions.Enforcing Guidelines: We can use AI to enforce Clean Core principles and other coding guidelines, ensuring that our code remains compliant and release-ready.My History with AIMy journey with AI didn’t start with the recent hype. It goes back to my college days, long before I entered the SAP ecosystem. My master’s thesis was on multi-layer perceptrons and scaled conjugate gradient descent methods. Ever since then, I’ve been independently researching and following the advancements in this fascinating field.Potential Pitfalls with AI and Take-awaysDifferences Between Prompts and Target LanguageNatural language is inherently ambiguous, imprecise, and often not concise. Science, and by extension software development, is about making formal, unambiguous statements. The greatest leaps in scientific history, like the formalization of algebra by Leibniz and Newton, came from moving away from rhetorical natural language and towards formal paradigms. The programming languages we use today are a natural evolution of this.So-called “vibe coding,” where one tries to build complex systems using only conversational language, is great for keynotes and simple examples but falls apart for enterprise-grade systems. I’ve seen attempts to “formalize” prompts to make agents more reliable, but this is a self-defeating exercise. You end up re-inventing a formal programming language, the very thing you were trying to avoid.Take-away: This is a fundamental problem that won’t be solved by simply scaling models. My practical advice is to iteratively update your agent’s prompts and instructions as you code. Let the agent update its own instructions when it runs into issues. And most importantly: never commit code without verifying it yourself. Human experience is still an absolute necessity.Predicting Syntax vs. Understanding SemanticsAs mentioned, all modern coding agents are based on a transformer architecture. They predict syntax, token by token, without any semantic intelligence or understanding of programming metaconcepts. They operate on a vast joint probability distribution and can generalize well within that distribution.However, when faced with an out-of-distribution (OOD) prompt, they “hallucinate.” While syntax prediction without semantic understanding is often “good enough” to boost the efficiency of an experienced software engineer, it’s irresponsible to leave critical design decisions to these agents without human oversight.Take-away: Until a new paradigm emerges that gives AI true semantic understanding, human oversight is paramount. Used responsibly, AI is an incredibly rewarding tool that greatly supports our daily work.Hallucinations (First and Higher Order)OOD prompts lead to what I call “first-order hallucinations,” where the LLM itself can’t generalize and essentially makes things up. The answers are unreliable and require verification. Methods to minimize this include training on more data (scaling) or grounding the LLM with symbolic systems like databases or knowledge graphs.This leads to what I term “higher-order hallucinations.” The grounded data source might be perfectly accurate. However, to access it, the agent must translate your natural language prompt into a formal query. If it hallucinates during this translation, it can create an invalid query (getting no data) or a faulty query that retrieves the wrong correct data. The source is clean, but the retrieval is flawed.Take-away: In the absence of a new paradigm, it’s best to use simpler query sequences instead of one complex one. Simplify the toolset available to your agents and bundle related tasks into a single tool to minimize these translation errors.Non-Determinism in AnswersWhile the forward-pass of a transformer is deterministic, the token decoding phase is not. This means if you prompt an LLM five times, you might get five different answers. For coding, this is problematic. Even with the temperature parameter set to zero, non-determinism persists.Take-away: You never know if you’ve received the “best” possible generated code, and neither does the AI. This makes human verification absolutely critical, always. For mass refactoring, a robust review process is essential. Interestingly, review agents can help here, but the final decision must rest with a human.Explainability & VerifiabilityLLMs are notoriously black boxes. Billions of floating-point numbers don’t offer a semantic explanation for why a particular answer was generated. Research into making these models more transparent is still in its infancy. The lack of semantic understanding, paired with non-determinism, makes true explainability incredibly difficult.Take-away: For now, the only option for users is to wait for new research and paradigms to emerge.TL;DRHuman oversight is critical until AI gains a true semantic understanding, which it currently lacks.Use generative AI to speed your coding, but…Never commit code without verifying it yourself.Use simpler query sequences instead of one complex one.Have a robust code-review process with humans making the final decision.Next StepsIt is irresponsible for any serious software engineer to ignore the current suite of AI tools, despite their shortcomings. Demonizing them is counter-productive. For those who know what they’re doing, these tools are massive efficiency boosters. With responsible use and human oversight, they can be successfully integrated into enterprise workflows and offer many opportunities to improve the human-machine interface.OutlookWe should not believe all the hype or inflate our expectations of what AI can realistically do today. At the same time, we shouldn’t refrain from engaging with these tools in a meaningful way. We, the developers, are still the ones with the creativity, innovation, and semantic understanding. Let’s use these tools to amplify that.Happy coding!What are your thoughts? Share your experiences with AI in the comments below! Read More Technology Blog Posts by Members articles

#SAP

#SAPTechnologyblog

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28

The AI app transforming how Kenya’s small businesses grow

Getting Started with Agents in VS Code

why does this query run and return the wrong result? Go!

CES: Backstage with Jacob Thaysen from Illumina

Setting the right expectations around AI for code generation and getting past the inherent limits

Generative AI is Only One Form of Artificial Intelligence

Why I Use AI

AI is Practical

AI Adds Value to My Work

My History with AI

Potential Pitfalls with AI and Take-aways

Differences Between Prompts and Target Language

Predicting Syntax vs. Understanding Semantics

Hallucinations (First and Higher Order)

Non-Determinism in Answers

Explainability & Verifiability

TL;DR

Next Steps

Outlook

More From Author

The Day Science Changed: Why 100 Biologists Now Fear AI

When Your AI Assistant Forgets Who You’re Talking About: A Journey Through Memory Management in…

Asa Vermette vs. An Average Rider – How Much Faster is a DH Pro?

Nothing confirms Phone (4a) series launch event

MAZDA2 FACELIFT 2022: APA SAJA BEDANYA?

You May Also Like:

The Day Science Changed: Why 100 Biologists Now Fear AI

When Your AI Assistant Forgets Who You’re Talking About: A Journey Through Memory Management in…

Asa Vermette vs. An Average Rider – How Much Faster is a DH Pro?

Renthal’s New DHC Pedals Have a Massive Platform and a Familiar Clip-In Mechanism

Specialized Updates Levo 4 eMTB With Up to 22% More Power

The New Ibis Ripley SL: Short Travel, Impressive Weight, US-Made

Marzocchi Athletes Are Migrating to Fox – Recent SEC Filings Suggest There Could Be More to This New Strategy

Shimano Trail Born Announces 2026 Programs and Global Expansion

Top Tagged

Generative AI is Only One Form of Artificial Intelligence

Why I Use AI

AI is Practical

AI Adds Value to My Work

My History with AI

Potential Pitfalls with AI and Take-aways

Differences Between Prompts and Target Language

Predicting Syntax vs. Understanding Semantics

Hallucinations (First and Higher Order)

Non-Determinism in Answers

Explainability & Verifiability

TL;DR

Next Steps

Outlook

Nothing confirms Phone (4a) series launch event

MAZDA2 FACELIFT 2022: APA SAJA BEDANYA?