
Dremio’s Alex Merced commentary on immediate design and information engineering and why architectures matter for agentic AI. This text initially appeared in Perception Jam, an enterprise IT group that allows human dialog on AI.
Agentic AI is having a second. Groups in every single place are wiring up brokers that plan duties, name instruments, and act autonomously. Demos look clean, and product pitches sound sharp. However most of those techniques sit on a fragile base. At their core, brokers depend upon information. If that information is gradual, scattered, or stale, the system breaks, irrespective of how good the mannequin or immediate could also be.
An agent isn’t only a chatbot with plugins. It’s a system that units a purpose, breaks it into steps, and makes selections primarily based on the present state of information. For that to work, three issues should occur:
- The information should really feel unified, even when it lives somewhere else.
- It should comply with guidelines for entry, high quality, and updates.
- It have to be prepared for low-latency queries at scale.
That is the place trendy information architectures are available, particularly lakehouses and open desk codecs like Apache Iceberg. These platforms already remedy issues that brokers inherit corresponding to schema drift, entry management, and real-time queries. That’s why a robust information basis, not simply immediate design, determines how properly an agent performs.
Brokers Want Engineering, Not Simply Prompts (and How Lakehouses Can Assist)
Many groups begin with the flawed focus. They tune prompts, arrange vector search, and wire brokers to APIs. It really works till actual customers arrive. Then, messy information, gradual queries, and damaged schemas begin to floor.
In case you’ve labored in information engineering, it will sound acquainted. The identical factor occurred with early information lakes. Groups dumped information into cloud storage, however no person trusted them. It took desk codecs, metadata catalogs, and dependable engines to convey construction and belief. Iceberg did that by turning object storage into one thing queryable and secure. Now, it’s time to convey those self same patterns to Agentic AI.
After-all, brokers want context to behave. That context lives in tables, paperwork, emails, and logs. A assist agent would possibly mix buyer historical past, product specs, and coverage guidelines in actual time. A planning agent would possibly observe stock, provider delays, and prices. In each case, the system solely works if it might probably pull the suitable information on the proper time.
Dashboards can tolerate delay however brokers can’t. In case your agent solutions with final week’s costs or an outdated coverage, it received’t simply be flawed, it’d trigger hurt. Which means information have to be contemporary, structured, and queryable on demand. Brokers push more durable on issues like streaming ingestion, small-batch writes, and quick metadata entry.
Past Structured Knowledge and Why Constructing Belief is Crucial
Brokers additionally pull from PDFs, chat transcripts, and internet pages. As soon as these are embedded or extracted into fields, they dwell alongside structured tables. The lakehouse should deal with each as first-class information. This implies monitoring metadata, audit trails, and schema evolution throughout each. Iceberg makes that potential by bringing ACID ensures and time journey to the lakehouse, which is crucial when an agent’s choice must be reproducible.
However, brokers don’t simply learn information, they typically write it too. A planning agent would possibly modify forecasts, and a assist agent would possibly replace ticket statuses. That makes information governance important. The platform should implement guidelines each time the agent reads or writes. If it doesn’t, the system could drift, create danger, or leak delicate data.
Apache Iceberg helps by imposing atomic writes, schema checks, and versioned adjustments. Every write creates a snapshot, and each snapshot is tracked. If an agent makes a mistake, a knowledge engineer can roll again. If one thing fails downstream, the engineer can pinpoint the precise information state that brought about it. This isn’t simply useful, it’s required for secure AI.
Governance additionally means controlling what every agent can do. Some brokers ought to learn however by no means write and a few ought to write however solely to staging tables. Iceberg helps branching so adjustments can land in a secure area earlier than they go dwell. A human can examine and merge them later. This construction reduces concern and will increase belief throughout the workforce.
Dealing with Combined Workloads
Brokers don’t run one by one. In follow, dozens would possibly run in parallel, some studying contemporary information, others replaying previous snapshots for audits or testing. That places strain on the information platform to assist blended workloads. Columnar codecs, hidden partitioning, and metadata pruning assist hold queries quick and low cost. However it takes planning.
Apache Iceberg helps options like partition evolution and computerized compaction. These assist management file sprawl and hold efficiency constant. As small updates land, generally 1000’s a day, Iceberg reorganizes them behind the scenes. It additionally manages metadata to keep away from bloated catalogs or stale snapshots.
A well-tuned system feels calm, even below load. With out these controls, question efficiency can spiral because the variety of small information grows, hurting each people and brokers.
Monitoring the Previous, Replaying Selections, and What It All Means for Date Engineers
Many agent duties depend upon occasion historical past. What modified? Who did what? What information was seen on the time? Iceberg’s time journey allows you to reply these questions with out additional plumbing. You possibly can question a snapshot from the final hour or final month and get a clear, remoted view of the system state.
This issues for belief. It helps with auditing, debugging, and studying. If an agent took the flawed step, engineers might reconstruct precisely what it noticed. If a brand new mannequin performs higher, groups can examine previous runs in opposition to the identical information. This reproducibility is what makes AI manageable at scale.
Some groups fear that stronger information platforms gradual them down. The alternative is true. Construction speeds you up. When the lakehouse handles freshness, schemas, and metadata, engineers can give attention to what issues, constructing higher prompts, choosing the proper instruments, and establishing helpful brokers.
Nobody desires to spend hours cleansing up nulls, patching damaged joins, or rewriting pipelines. When the platform does its job, that work disappears. It additionally removes surprises. Tables keep constant, fields behave, and pipelines hold operating. Your brokers get the context they want with out human babysitting.
Why This Issues Now
Agentic AI would possibly really feel new, however the issues it exposes should not. Knowledge engineers have already solved lots of them. They’ve seen what occurs when platforms develop with out guardrails. They’ve seen the worth of clear contracts, open codecs, and shared governance.
That have now applies to brokers so there isn’t any have to invent a brand new stack. The warehouse taught us the worth of unpolluted schemas, the lakehouse confirmed us how one can mix flexibility with management, and Iceberg taught us how one can deal with object storage like a database. These classes now carry into the world of autonomous techniques.
The takeaway is straightforward: immediate design will get you a demo. Knowledge structure will get you a product.


Leave a Reply