Guide
July 2, 2026
8 min read
Maxime Dalessandro & Nicolas Fares

ERP Data Migration: Mapping Legacy Databases Faster

Legacy data mapping is where ERP projects slip after go-live. How semantic discovery plus expert-validated mappings turns weeks of analysis into days.

#ERP data migration#data reprise#legacy databases#data mapping#AI agents#agent context layer

Every ERP integrator has a name for the phase that hurts. In French-speaking markets it is "la reprise de données," the data reprise: taking over the client's legacy data and landing it in the new system. In our discovery calls with ERP integrators, the same two observations come up every time. First, integrators consistently estimate the reprise eats 10 to 15 percent of total project effort, and it is almost never budgeted that honestly. Second, it is the one phase where mistakes stay hidden during the project and surface after go-live, in production, in front of the client.

This article is about why legacy database mapping is structurally hard, why its failure mode is so expensive, and what a better loop looks like: semantic discovery of the source system first, then candidate mappings that a domain expert validates instead of drawing from scratch.

The source system is written in a dead language

The client's legacy system is rarely one clean database. It is fifteen or twenty-five years of accumulated business logic, encoded in a schema nobody fully understands anymore.

The symptoms are familiar to anyone who has done this work:

  • Cryptic names. Columns called X_STAT_2, FLAG_A, MONTANT3. Tables named after modules that were renamed twice. A customers table and a clients table that overlap but do not match.
  • Implicit conventions. A status code of 3 means "blocked" in the sales module and "priority" in logistics, because two teams extended the same field in different decades. Prices are stored net in one table and gross in another, and everyone who knew which was which has left.
  • Decades of edge cases. One-off fixes applied directly in production. Magic customer IDs that the invoicing batch treats specially. A date column that switched meaning in 2011 and was never backfilled.

Three things make this worse than an ordinary reverse-engineering job. The people who built the source system are usually gone, so there is nobody to ask. The documentation, where it exists, describes the system as it was designed, not as it behaves after twenty years of patches, so it lies. And the target ERP is not a blank slate: Odoo, Dynamics, SAP, each comes with its own opinionated data model, and the legacy data has to be reshaped to fit opinions it was never designed around.

This is why a mapping is never a mechanical act. Writing "source column KUNNR_ALT maps to partner reference" is a semantic claim: a statement about what the legacy system meant. Historically that claim was validated by tribal knowledge. The person who remembered why the column existed would glance at the spreadsheet and say "no, that one is only populated for export customers." That person no longer works there. The tribal knowledge the whole method depended on has left the building, and the method has not been updated to match.

Mapping mistakes are invisible, then explosive

Here is the property that makes the reprise uniquely dangerous: a wrong mapping looks exactly like a right one in the spreadsheet phase. The row in the mapping document is syntactically perfect. Source field, target field, transformation rule. It compiles, it loads, the migration script runs green. The mistake only becomes observable when the business runs on the data.

Concrete versions of this, all of the shape integrators describe:

  • The wrong VAT code. The legacy system stored tax regimes as numeric codes, and code 4 was mapped to the standard domestic rate. Except code 4 also covered a small set of intra-community customers under an older convention. Nobody notices until the first invoicing run in the new ERP applies the wrong VAT treatment to real invoices, and the accountant finds it at closing.
  • The merged customers. Two customer records share a name and city, so the deduplication rule folds them into one. They were a franchisor and a franchisee with separate credit terms. Their orders, balances, and history are now one entity, and unpicking a merge after weeks of new transactions is far harder than the merge was.
  • The silent unit mismatch. The legacy inventory stored quantities in grams for one product family because of a workaround from 2009. The mapping assumed kilograms everywhere. Stock levels for that family are now off by a factor of a thousand, and the first hint is a purchasing suggestion that makes no sense.

None of these are exotic. They are the ordinary texture of legacy data, and they share the same cost curve: nearly free to catch during analysis, brutally expensive to catch in production. After go-live the wrong data has been transacted on. Corrections compete with a live business, under a client who has just lost confidence in the project. This is why experienced integrators fear the reprise more than any development workstream: development bugs fail loudly in testing, mapping bugs fail quietly in production.

The loop most teams run today is archaeology

Look at how the mapping document actually gets produced today. A senior consultant, usually the scarcest person on the project, sits down with a schema dump and a spreadsheet. They query the source system table by table. They profile columns to guess meanings. They interview whoever at the client is oldest in the role. They cross-reference the legacy application's screens against the tables underneath. Then they write mapping lines, one by one, from scratch.

This is archaeology, and it has archaeology's economics. It takes weeks of expert time per system, it does not parallelize well, and its quality depends on which artifacts happened to survive. Worst of all, the expert spends most of those weeks on the part a machine could do (figuring out what is there and how it connects) and only a fraction on the part that genuinely requires their judgment (deciding whether a proposed mapping is correct for this business).

A better loop: discover first, then review

The fix is to reorder the work. Before anyone writes a mapping line, build a semantic map of the source system: what is there, what it means, and how it connects. Which tables are live versus abandoned. Which columns are populated, with what distributions, and which conventions actually hold in the data rather than in the documentation. Where the real foreign keys are, including the implicit ones no constraint ever enforced. Which values are derived and which are entered.

This discovery step is exactly the kind of exhaustive, evidence-based reading of a database that machines are now good at and humans are slow at. An agent can profile every column, trace every join path, and flag every convention violation without getting bored on table 300 of 800. The output is not a mapping. It is the context that mapping decisions need: the semantic layer the departed builders used to carry in their heads.

With that map in place, the second step changes shape. Instead of the expert drawing mappings from scratch, the system proposes candidate mappings into the target ERP's model, each with the evidence behind it: this legacy field is populated for 94 percent of active customers, joins to the invoicing table this way, and matches the shape of the target's tax-position field. The expert's job becomes review. Approve, correct, or reject, one decision at a time, with the reasoning in front of them.

That reordering matters for the failure mode, not just the timeline. The mistakes in the scenarios above (the overloaded code 4, the gram-based product family) are precisely the anomalies that systematic profiling surfaces and that a tired human skips past on week three. Expert judgment moves from archaeology to review, which is both faster and a better use of the judgment. In the projects integrators describe to us, the analysis phase is the multi-week bottleneck; a discovery-then-review loop compresses it to days, because the machine does the exhaustive reading and the human does only the deciding.

The one thing the loop must never do is remove the human. A candidate mapping is still a semantic claim, and the domain expert is the only party qualified to ratify it. The same argument we make about agent-run schema migrations applies here: the value is not automation of the decision, it is a decision surface good enough that a senior person can judge each mapping in seconds instead of excavating it in hours.

Where Datapace fits

Datapace is the context layer and control plane for agents on live data, and the ERP data reprise is one of the workloads we are building it for, directly out of those integrator conversations.

Applied to a migration, it looks like the loop above. Datapace connects to the legacy source system and builds the semantic map: entities, relationships, lineage, populated versus dead columns, the conventions the data actually follows. On top of that map, agents propose candidate mappings into the target ERP's model. Every candidate goes to the integrator's domain expert for validation, and nothing is treated as decided until a human approves it. Each accepted mapping is recorded with its rationale and the evidence it was based on, so six months after go-live, when someone asks why a legacy field landed where it did, the answer is an audit trail, not a memory.

Two honest caveats. The 10 to 15 percent figure is what integrators consistently report from their own projects, not a Datapace measurement. And no tool removes the need for a domain expert; the point is to spend that expert on judgment instead of excavation.

We are validating this workflow with ERP integrators as design partners, running it against real legacy databases, including live walkthroughs of an actual legacy-to-ERP migration. If the reprise is the phase of your projects that scares you most, book a call and bring your ugliest source schema. We would genuinely like to see it.

Frequently asked questions

What is the data reprise in an ERP project?
The data reprise (from the French reprise de données) is the phase of an ERP integration where the client's legacy data is analyzed, mapped, and migrated into the new system. Integrators consistently estimate it consumes 10 to 15 percent of total project effort, and it is the phase where mistakes stay hidden until after go-live.
Why do ERP data mapping mistakes only surface after go-live?
A wrong mapping is syntactically identical to a right one: the spreadsheet row is well-formed, the migration script runs green, and the load completes. The error is semantic, a wrong claim about what the legacy field meant, so it only becomes observable when the business transacts on the data, for example in the first invoicing run or the first stock replenishment.
How does semantic discovery speed up legacy database mapping?
Instead of a senior consultant reverse-engineering the source system table by table, an agent profiles every column, traces every join path, and surfaces the conventions the data actually follows. Mapping then starts from system-proposed candidates backed by evidence, and the expert reviews decisions instead of excavating them, which compresses the analysis phase from weeks to days.
Does automated mapping replace the ERP integrator's domain expert?
No. A mapping is a semantic claim about what the legacy system meant, and the domain expert is the only party qualified to ratify it. The system proposes candidate mappings with evidence, the expert approves, corrects, or rejects each one, and every accepted mapping is recorded with its rationale.

Keep reading

Ready to let agents touch production, safely?

Bring a use case. We will show you what agents can do on your live data, inside your guardrails.