Key takeaways
The EU AI Act introduces concrete obligations for high-risk AI systems around data governance and technical documentation, with key high-risk obligations applying from August 2026.
Stripped of legalese, Article 10 and Annex IV ask one practical question of your data: can you show where it came from and how it was made? This guide translates that into records data teams can start building now.
- Article 10 covers data and data governance for high-risk systems.
- Annex IV sets out the technical documentation you must keep.
- For human-generated data, both point directly at provenance.
- The cheapest time to capture provenance is while the data is made.
August 2024
Regulation enters into force
February 2025
Bans on prohibited AI practices apply
August 2025
Obligations for general-purpose AI models apply
August 2026
Most high-risk (Annex III) obligations apply
August 2027
High-risk obligations for regulated products (Annex I) apply
Who the data rules apply to
The obligations discussed here apply to providers of high-risk AI systems as defined by the Act, for example AI used in areas such as medical devices, critical infrastructure, employment, credit and essential services. If your system is high-risk, its training, validation and test data fall in scope.
This article is a practical explainer for data and ML teams, not legal advice. Confirm your specific obligations and timelines with qualified counsel.
What Article 10 (data governance) asks for
Article 10 covers data and data governance: training, validation and test data must be subject to appropriate governance practices. In practice that means being able to describe, per dataset, where the data originated, how it was collected or produced, and what was done to examine it for issues such as bias.
The key shift is from asserting that data is good to being able to show why, with records rather than reassurance.
- Document the origin and collection or generation method of each dataset.
- Describe processing, labelling and the instructions given to annotators.
- Examine and document possible biases and gaps.
- Keep the governance records available for the system's lifetime.
What Annex IV (technical documentation) requires
Annex IV sets out the technical documentation a provider must maintain, including a description of the data and the methodologies used to develop the system. For human-generated data, that points directly at provenance: who produced it, under what instructions, with what review and at what level of agreement.
If you can export a per-batch lineage bundle that captures these facts, you have most of the raw material Annex IV expects, ready to assemble rather than reconstruct.
A practical preparation checklist
You do not need a finished compliance program to start. You need to capture the right records now, while the data is being produced, so the documentation is a by-product rather than an archaeology project later.
- Record dataset origin, collection or generation method, and licensing.
- Log annotator instructions and any model assistance per task.
- Capture a tamper-evident, per-task audit trail for human-labeled data.
- Measure and document inter-rater agreement and quality controls.
- Document bias examination and known limitations.
- Be able to export a per-batch lineage bundle mapped to Annex IV fields.
