Part 2/3 | LLM generators were trained on public code - simple examples, minimal composition. Your Hydra-driven, config-first architecture is far from that center. Under ambiguity, the model drifts toward common patterns, not your invariants.
This article is not about BERT or embedding models. It is about tool-using code agents built around an LLM: code LLM agents (e.g., Claude Code, OpenAI Codex-style agents). These systems wrap an LLM in an orchestration layer that can optionally retrieve code, read files, and run tools.
Two terms will be used precisely:
With that established, the most common source of disappointment in “architectural prompting” is a form of distribution mismatch.
Many generators were trained on a mixture of sources that strongly reflect what is publicly available and frequently represented. Regardless of the exact mix, the practical consequence is that a generator’s defaults tend to align with patterns that are common, simple, and widely visible.
This becomes very tangible in systems that rely on sophisticated configuration and composition, such as Hydra-style workflows. “Config-driven” in production often means:
In public repositories, the same tooling is often used in a simplified way: small examples, minimal composition, and conventions that are not enforced across a large platform.
If your codebase uses configuration as an architectural backbone rather than as a convenience layer, you are asking the generator to operate far away from the centre of what it most frequently sees. Under ambiguity, it will often produce code that is plausible in isolation but inconsistent with your architecture.
A common expectation is that “if the agent can read the docs, it will know how to apply them.” In practice, documentation tends to describe feature surfaces: what is possible, what flags exist, what decorators do. It rarely shows the organisational patterns that make the tool effective at scale.
Production usage patterns are often:
This is not a critique of documentation; it is simply a reality of complex systems. The “how” of a mature architecture is often encoded in internal examples, templates, and guardrails-artefacts that public docs cannot fully replicate.
Even with a tool-enabled agent, the generator is still generating tokens under its trained weights and biases, conditioned on the text it sees. It will try to compress what it reads into a few internal cues:
When the visible code is large, interconnected, and shaped by conventions, those cues can be incomplete or misleading. The generator may correctly identify the names of relevant functions and still miss the invariants that matter. The result is a patch that looks coherent yet subtly changes the architecture: a direct instantiation appears where only config-driven instantiation is acceptable; a dependency boundary is crossed; a convenience shortcut is introduced.
Providing a manifest of involved files is helpful, but it does not automatically provide the semantics the generator needs. In architecture-heavy systems, correctness depends on relationships:
A list of files improves coverage; it does not guarantee comprehension. In many production settings, “understanding” is not just reading code; it is understanding the design intent that sits behind the code.
If the generator’s priors do not match your architecture, the most practical response is to feed it a better local distribution. That typically means creating artefacts that make your patterns easy to retrieve and hard to misinterpret:
Canonical reference implementations A small number of “golden path” modules that demonstrate your intended patterns end-to-end: configuration layout, instantiation, logging, validation, and error handling.
Templates that encode architecture Internal scaffolds that make it difficult to do the wrong thing. If a new component must be config-driven, provide a template that begins config-first and leaves little room for ad hoc construction.
Architecture rules as checkable constraints Dependency direction checks, forbidden import rules, and structural validations make the pattern enforceable instead of aspirational.
Tool outputs as grounding material Logs, stack traces, and real command outputs help the agent remain anchored. When the generator must account for concrete evidence, it is less likely to drift into plausible-but-wrong narratives.
A useful dividing line is the type of work:
The more your system depends on conventions that are not widely represented in public code, the more important it becomes to treat the agent as a fast proposer and rely on your own architecture artefacts and checks for correctness.
From Demos to Production: Part 1
From Demos to Production: Part 3
© Tobias Klein 2026 · All rights reserved
LinkedIn: https://www.linkedin.com/in/deep-learning-mastery/