Your data team’s competitive edge is now instruction and oversight

Johan Steyn
Mar 13
4 min read

As agent automation spreads, reskilling towards specification, testing, and governance becomes urgent.

Article link: https://open.substack.com/pub/johanosteyn/p/your-data-teams-competitive-edge?r=73gqa&utm_campaign=post&utm_medium=web&showWelcomeOnShare=true

Audio summary: https://youtu.be/To33ZKTuyOM

If you work anywhere near data, you can feel the ground moving. The promise being marketed right now is not simply “faster analytics” or “better dashboards”. It is a bigger claim: AI agents can now generate large portions of the databases, schemas, and pipelines that used to take teams of data engineers weeks to design and build. Databricks has been especially bold about this direction, with recent commentary suggesting agents are taking over a significant share of database creation.

Whether the exact number holds in every environment is almost beside the point. The leadership challenge is clear: when building becomes cheap and fast, the real competitive edge shifts to specifying what you want precisely, constraining what the system is allowed to do, and verifying that what it produced is correct, compliant, and trustworthy.

CONTEXT AND BACKGROUND

Data engineering has always been a blend of craft and discipline: modelling business reality into structures, moving data reliably, and keeping definitions stable across teams. It is rarely glamorous, but it is the backbone of AI and analytics. The problem is that the demand for data products has outpaced the supply of skilled people, and organisations have tried to solve it with tooling, automation, and standardisation.

Now, agentic automation is accelerating that trend. Databricks’ own messaging about enterprise AI agents focuses heavily on moving from experiments to production, and it repeatedly highlights governance and evaluation as the difference between progress and chaos. The message between the lines is important: if agents are going to create or modify core data assets, the organisation needs stricter rules, not looser ones.

Meanwhile, the broader industry is already inventing new roles that sit between people and agentic systems. Even outside the traditional data platform vendors, employers are describing “AI orchestration” and governance roles that look a lot like the next evolution of data engineering.

INSIGHT AND ANALYSIS

If agents can generate schemas and pipelines quickly, the value of “being able to build” declines relative to “being able to specify”. That means reskilling must focus on three capabilities.

First, instruction. Not prompting in the casual sense, but requirements definition that is unambiguous: business rules, naming conventions, canonical definitions, data sensitivity labels, and the intended downstream use. If the instructions are vague, the output will be plausible but wrong, and it will be wrong at scale.

Second, constraints. Agents must operate inside clearly defined boundaries: what sources they may access, which transformations are allowed, which joins are acceptable, what constitutes personally identifiable or confidential data, and what approval is needed for changes. Without constraints, you will get speed, but you will also get drift: multiple versions of “customer”, inconsistent time logic, and pipelines that quietly violate internal policy.

Third, verification. This is the most under-appreciated shift. In an agent-built world, quality assurance cannot be a final, occasional review. It must be continuous and test-driven: automated checks, lineage visibility, reproducible runs, anomaly detection, and clear audit trails for every change. This is why “policy as code” ideas are gaining traction: the organisation’s rules need to be machine-checkable, not buried in a PDF on SharePoint.

There is also a people challenge that leaders should not ignore. If agents absorb the repetitive “junior work”, organisations risk weakening their own talent pipeline. A generation of data professionals could become excellent supervisors of tools without understanding the fundamentals that let them spot subtle errors. Reskilling must therefore include deliberate practice: reviewing agent outputs, performing structured failure analysis, and learning the underlying principles of modelling and reliability.

IMPLICATIONS

For business leaders, this is an operating model decision. Treat agent automation in data engineering the way you treat automation in finance: define which actions are low risk, which require approval, and which are forbidden. Then measure outcomes: data quality incidents, rework, time to deliver new datasets, and the business impact of errors.

For data leaders, build a “specification and verification” discipline. Standardise templates for requirements, create a shared ruleset for definitions, and insist that every agent-generated asset ships with tests, lineage, and documentation.

Also plan for role redesign: you will need fewer people hand-building pipelines, and more people designing constraints, managing governance, and validating outputs.

For teams, invest in skills that travel: domain understanding, precision in communication, testing mindset, and governance literacy. Those are the abilities that will differentiate a high-performing data team when the building blocks become automated.

CLOSING TAKEAWAY

If Databricks and others are right that agents will generate most of what used to be manual database work, then the competitive advantage moves up the stack. The winning organisations will not be those with the flashiest agents. They will be those who can clearly define what “correct” means, encode constraints that prevent avoidable harm, and verify outputs continuously before bad data becomes bad decisions. In the agent era, data engineering does not disappear. It becomes more strategic, more accountable, and far more closely tied to trust.

Author Bio: Johan Steyn is a prominent AI thought leader, speaker, and author with a deep understanding of artificial intelligence’s impact on business and society. He is passionate about ethical AI development and its role in shaping a better future. Find out more about Johan’s work at https://www.aiforbusiness.net

Brainstorm magazine November 2019

Your data team’s competitive edge is now instruction and oversight

Recent Posts

Comments