Skip to content

August 2025 · 7 min read

The EU AI Act for data teams

What the AI Act's data governance and documentation rules actually require, in plain terms, and how to prepare

PathWize

Key takeaways

The EU AI Act introduces concrete obligations for high-risk AI systems around data governance and technical documentation, with key high-risk obligations applying from August 2026.

Stripped of legalese, Article 10 and Annex IV ask one practical question of your data: can you show where it came from and how it was made? This guide translates that into records data teams can start building now.

  • Article 10 covers data and data governance for high-risk systems.
  • Annex IV sets out the technical documentation you must keep.
  • For human-generated data, both point directly at provenance.
  • The cheapest time to capture provenance is while the data is made.
EU AI Act: key dates
  1. August 2024

    Regulation enters into force

  2. February 2025

    Bans on prohibited AI practices apply

  3. August 2025

    Obligations for general-purpose AI models apply

  4. August 2026

    Most high-risk (Annex III) obligations apply

  5. August 2027

    High-risk obligations for regulated products (Annex I) apply

Source: Regulation (EU) 2024/1689 (EU AI Act)

Who the data rules apply to

The obligations discussed here apply to providers of high-risk AI systems as defined by the Act, for example AI used in areas such as medical devices, critical infrastructure, employment, credit and essential services. If your system is high-risk, its training, validation and test data fall in scope.

This article is a practical explainer for data and ML teams, not legal advice. Confirm your specific obligations and timelines with qualified counsel.

What Article 10 (data governance) asks for

Article 10 covers data and data governance: training, validation and test data must be subject to appropriate governance practices. In practice that means being able to describe, per dataset, where the data originated, how it was collected or produced, and what was done to examine it for issues such as bias.

The key shift is from asserting that data is good to being able to show why, with records rather than reassurance.

  • Document the origin and collection or generation method of each dataset.
  • Describe processing, labelling and the instructions given to annotators.
  • Examine and document possible biases and gaps.
  • Keep the governance records available for the system's lifetime.

What Annex IV (technical documentation) requires

Annex IV sets out the technical documentation a provider must maintain, including a description of the data and the methodologies used to develop the system. For human-generated data, that points directly at provenance: who produced it, under what instructions, with what review and at what level of agreement.

If you can export a per-batch lineage bundle that captures these facts, you have most of the raw material Annex IV expects, ready to assemble rather than reconstruct.

EU AI Act maximum fines (or % of global annual turnover)
Prohibited practices · or 7% of turnover€35M
Other obligations · or 3% of turnover€15M
Incorrect information · or 1% of turnover€7.5M
Source: EU AI Act, Article 99

A practical preparation checklist

You do not need a finished compliance program to start. You need to capture the right records now, while the data is being produced, so the documentation is a by-product rather than an archaeology project later.

  • Record dataset origin, collection or generation method, and licensing.
  • Log annotator instructions and any model assistance per task.
  • Capture a tamper-evident, per-task audit trail for human-labeled data.
  • Measure and document inter-rater agreement and quality controls.
  • Document bias examination and known limitations.
  • Be able to export a per-batch lineage bundle mapped to Annex IV fields.

Frequently asked questions

When does the EU AI Act apply to high-risk AI?

The Act phases in over time, with key obligations for high-risk AI systems applying from August 2026. Exact dates depend on the system and category, so confirm timelines for your case with qualified counsel.

What does Article 10 of the EU AI Act require?

Appropriate data governance for training, validation and test data: documenting origin and collection or generation, describing processing and labelling, and examining data for issues such as bias, with records to back it up.

What is Annex IV of the EU AI Act?

The list of technical documentation a provider of a high-risk AI system must keep, including a description of the data and the methodologies used to develop the system. For human data this maps directly onto provenance.

How should data teams prepare for the AI Act?

Capture provenance while data is produced: record dataset origin and method, log annotator instructions and model assistance, keep a per-task audit trail, measure inter-rater agreement, document bias checks, and be able to export a per-batch lineage bundle.

LB

Lena Brandt

Compliance Research

Lena translates EU AI regulation into practical requirements for data teams.