What Lakehouse Enables — and What It Breaks

Lakehouse succeeded by decoupling storage from execution

The lakehouse model succeeded because it solved real, structural problems.

Open table formats made data portable. Decoupled compute made engines interchangeable. Shared storage allowed multiple teams and workloads to operate on the same data without duplication.

This shift was decisive. By separating storage from execution, lakehouse platforms enabled flexibility that tightly coupled warehouses could not.

But that same decoupling also changed something more subtle.

It changed who owns performance.

Lakehouse solved data portability — not performance portability

Once execution is decoupled from storage, performance no longer has a single owner.

No engine controls the full access path. No team owns how data is read across all workloads. No optimization applies everywhere by default.

Lakehouse made data shareable across engines, but left performance trapped inside them.

This is not an implementation gap. It is a consequence of the architecture itself.

Shared data does not imply shared acceleration

In a lakehouse, the same tables and files are accessed by multiple engines for different purposes.

But each engine brings its own planner, execution model, and assumptions. Metadata is interpreted differently. Scans are planned differently. Data is read and decoded differently.

As a result, acceleration does not travel with the data.

The same dataset may be scanned, optimized, and accelerated multiple times — independently, inside each engine.

The data is shared. The acceleration is not.

Performance becomes a repetition problem

Consider a familiar pattern.

A batch job scans an Iceberg table overnight. An interactive query scans the same table an hour later. A BI dashboard scans it again the next morning — through a different engine.

Each workload traverses a similar access path. Each re-reads metadata. Each redoes filtering and decoding work.

Nothing is fundamentally slow. The work is simply repeated.

Lakehouse does not make performance worse. It makes performance non-amortized.

The more engines a platform supports, the more often the same work is paid again.

Why migration undermines the lakehouse itself

When engine-level acceleration falls short, migration is often proposed as the remedy.

Move workloads to the fastest engine. Standardize on a single runtime. Rewrite pipelines to fit a new execution model.

In a lakehouse, this response is self-defeating.

The core promise of the lakehouse is coexistence: multiple engines operating on shared data. Migration collapses that flexibility and reintroduces tight coupling at the execution layer.

Migrating to a faster engine in a lakehouse defeats the reason the lakehouse exists.

It trades platform flexibility for localized gains, fragmenting performance rather than compounding it.

The limitation is not engines, but where acceleration lives

None of this diminishes the importance of engine-level innovation.

Engines must continue to improve execution efficiency, planning, and runtime behavior. Many optimizations can only happen there.

But engines solve execution problems. Lakehouse platforms expose access problems.

As long as acceleration lives inside engines, it remains bound to execution contexts rather than shared access paths. Performance improvements stay local, even when the data is global.

This mismatch is structural.

The lakehouse gap

Lakehouse made data portable. It made execution flexible. It did not make performance portable.

As long as acceleration is engine-local, performance remains fragmented across runtimes — even when the data is shared.

Once performance fragments, improving it is no longer a matter of faster execution.

It becomes a coordination problem.

And coordination problems cannot be solved inside engines.