Share this
What AI-Ready Data Actually Looks Like: A Practical Checklist
Marc de Haas
on Jun 15, 2026 16:46
| Updated: Jun 15, 2026 18:05
In many organisations right now, an AI project has been approved, a vendor has been selected, and a go-live date is somewhere on a roadmap. Six months in, the team is doing something different from what they planned, rebuilding data infrastructure rather than building on top of it.
-1.png?width=1120&height=630&name=Copy%20of%20Blog%20Format%20(1120%20%C3%97%20630px)-1.png)
The gap between where AI initiatives start and where they actually land usually has the same root cause: the data was not in a state the project needed it to be in, and nobody assessed that systematically before the scope was set.
AI readiness is not about having the most advanced tooling. It comes down to five things, most of which are organisational before they are technical.
Clean data and knowing where it is not
"Clean data" tends to get treated as a binary: either it is clean or it isn't, and the answer is usually "mostly". What matters more is whether the team understands where the quality gaps are and what downstream impact those gaps would have on the AI use case in question.
Undocumented quality issues are harder to manage than known ones. A dataset with acknowledged gaps, where the team has agreed on how to handle them, is far more reliable as a foundation than one that is assumed to be correct because nobody has looked closely. AI outputs derived from the second type fail quietly, the model continues producing results without any visible sign that the inputs were unreliable.
In practice, this means auditing the specific datasets your AI use case will depend on before the project scope is finalised. Not a full data quality programme, a targeted review of completeness, duplication rates, and freshness for those inputs specifically.
Clear ownership per dataset
Every dataset an AI system relies on needs an accountable owner, someone whose responsibility includes quality, definition, and availability over time, not just someone whose name appears in a system.
Ownership gaps tend to show up in AI projects in specific ways: a training dataset that turned out to include test accounts, a churn model trained on contract data that included cancelled pilots, a recommendation engine fed by a field nobody maintains anymore. In each case, the model was doing what it was designed to do. The problem was in the data it trusted.
Assigning ownership creates friction, and teams often resist it. The pushback is understandable, it adds accountability where there was none before. But the cost of unowned data propagating through a production AI system is higher than the cost of the conversation needed to establish who is responsible for what.
Documented pipelines
If the path data takes from source system to model input cannot be explained clearly, the outputs of that model cannot be fully trusted, not because the model is wrong, but because the inputs are not understood well enough to know when they change in ways that matter.
The most common gap here is not the absence of documentation across the board, but the absence of documentation at transformation points. What happened to this field between the source system and the warehouse? Was it filtered, aggregated, or imputed at some stage? When a data engineer is the only person who knows the answer to that question, the system has a dependency that will eventually create a problem.
Lineage tooling, BigQuery's built-in data lineage, Dataplex, or lightweight catalogue tooling, makes these transformation paths visible and auditable. For AI systems that need to be monitored and debugged over time, that visibility is not optional infrastructure. It is what makes the difference between a system you can maintain and one you can only replace.

A mechanism for detecting when outputs degrade
Most AI projects plan carefully for launch. Fewer plan for what happens six months later, when data distributions shift, when source systems change upstream, or when the model's assumptions about the world gradually stop reflecting reality.
A feedback loop in this context means a monitored signal that tells someone when something has gone wrong, data drift detection, output distribution monitoring, or downstream business metrics that correlate with model performance. Without it, degradation happens quietly. The model keeps producing outputs, the team keeps trusting them, and the first visible sign of a problem tends to be a business decision that nobody can explain.
Building this into the project design from the start is easier than adding it retrospectively. It also requires a clear answer to the question of who is responsible for monitoring it, which tends to surface the same ownership questions the rest of this list raises.
Discipline around fixing before extending
There is a pattern that recurs in AI project post-mortems: a quality issue appeared, and rather than addressing it directly, the team routed around it. A transformation was added to compensate. A separate pipeline was introduced. The workaround worked well enough to keep the project moving, and the underlying issue stayed in place underneath.
Over time, these compensations compound. The data foundation becomes more complex, the number of people who understand it shrinks, and the cost of eventually fixing the root cause increases. AI readiness often requires a period of deliberate remediation before a new capability is built on top, retiring pipelines that have outlived their original purpose, establishing definitions that multiple teams have actually agreed on, addressing quality issues rather than encoding workarounds.
This phase tends not to appear in project proposals, and it is often where timelines first slip. Organisations that accept it as necessary, scope for it explicitly, and treat it as foundational investment rather than delay are in a meaningfully different position when the model eventually launches.
Where to start
These five conditions do not need to be perfect before a project begins, but they do need to be assessed honestly. A data readiness review before the scope is set, covering which datasets are involved, who owns them, how they move, and what quality issues are known, is a much smaller investment than discovering the gaps twelve months into a project.
If you want to understand where your data actually stands before the next AI initiative, we are happy to work through that with you.
Marc de Haas
Marc is Head of Development at Crystalloids and works closely on client projects as a Solution Architect and Cloud Architect.
With more than 30 years of experience, he designs and builds modern data platforms and scalable cloud architectures for data-centric and AI-driven solutions. He studied Computer Science at the...
Share this
- June 2026 (3)
- May 2026 (2)
- April 2026 (17)
- March 2026 (5)
- February 2026 (4)
- January 2026 (2)
- December 2025 (2)
- November 2025 (2)
- October 2025 (2)
- September 2025 (3)
- August 2025 (2)
- July 2025 (1)
- June 2025 (1)
- April 2025 (4)
- February 2025 (2)
- January 2025 (3)
- December 2024 (1)
- November 2024 (5)
- October 2024 (2)
- September 2024 (1)
- August 2024 (1)
- July 2024 (4)
- June 2024 (2)
- May 2024 (1)
- April 2024 (4)
- March 2024 (2)
- February 2024 (1)
- January 2024 (4)
- December 2023 (1)
- November 2023 (4)
- October 2023 (4)
- September 2023 (4)
- June 2023 (2)
- May 2023 (2)
- April 2023 (1)
- March 2023 (1)
- January 2023 (4)
- December 2022 (1)
- November 2022 (4)
- October 2022 (3)
- July 2022 (1)
- May 2022 (2)
- April 2022 (2)
- March 2022 (5)
- February 2022 (2)
- January 2022 (5)
- December 2021 (5)
- November 2021 (4)
- October 2021 (2)
- September 2021 (1)
- August 2021 (3)
- July 2021 (4)
- May 2021 (2)
- April 2021 (1)
- February 2021 (2)
- December 2020 (1)
- October 2020 (2)
- September 2020 (1)
- August 2020 (2)
- July 2020 (2)
- June 2020 (1)
- March 2020 (1)
- February 2020 (1)
- January 2020 (1)
- November 2019 (3)
- October 2019 (2)
- September 2019 (3)
- August 2019 (2)
- July 2019 (3)
- June 2019 (4)
- May 2019 (2)
- April 2019 (4)
- March 2019 (2)
- February 2019 (2)
- January 2019 (4)
- December 2018 (2)
- October 2018 (1)
- September 2018 (2)
- August 2018 (1)
- July 2018 (1)
- May 2018 (1)
- April 2018 (2)
- March 2018 (5)
- February 2018 (1)
- January 2018 (3)
- November 2017 (2)
- October 2017 (2)


%20(1).png?width=75&height=75&name=Contact%20Services%20(800%20x%20800%20px)%20(1).png)
