v0.5 Available

The Transparent Acceleration
Layer for Lakehouse

⭐ Now accepting 3 Early Access Design Partners for Iceberg acceleration.

Accelerate Apache Iceberg workloads by 5–20× with zero application changes. Faster metadata. Smarter pushdown. Unified caching.

Works seamlessly with

Spark
Trino
Flink
DuckDB
DataFusion

Why Laketap?

The Lakehouse acceleration layer — eliminating I/O bottlenecks with smarter metadata and data pushdown.

01

5–20× Faster Queries

Real Acceleration, Not Just Caching. Laketap reduces object-store latency, eliminates repetitive Parquet decoding, and applies selective pushdown to minimize CPU & network overhead.

Faster BILower ETL LatencyCost-efficient
02

Works Across All Engines

No need to tune each engine separately. Unifies caching for Spark, Trino, Flink, DuckDB, DataFusion — with more to come.

One LayerMulti-Engine
03

Zero Application Changes

No SQL rewrite. No table changes. No engine forks. Simply deploy Laketap and update the connector.

Drop-inSeamless
04

Tiered & Adaptive Caching

Tiered design: TCache holds manifest/schema, ACache Meta stores footers/indexes, ACache Data caches row-groups with selective decode—auto promote/demote based on workload.

SmartLightweight
05

Runs in Your VPC

No Lock-In. Your data never leaves your environment. Laketap is an on-prem / VPC-side acceleration layer with full enterprise isolation.

SecurePrivate
Architecture

Unified Acceleration Framework

A three-step architecture that accelerates metadata, optimizes split planning, and performs compute pushdown across cache and object storage.

Architecture Overview · Metadata, planning, and data paths in one layer

Laketap is now validating TCache/ACache with design partners on real Iceberg workloads.

Step 101

Catalog Cache (TCache) — Metadata Acceleration

Catalog-facing cache that implements the Iceberg REST Catalog API and stores manifests, schema, and snapshots to cut catalog latency and avoid repeated object-store round trips.

  • Snapshot-aware routing
  • Manifest list & manifest file cache
  • Schema / partition spec cache
  • Fast listFiles/listManifests
Step 202

Footer & Index Cache (ACache Meta) — Intelligent Split Planning

During planning, cached Parquet footers and indexes enable row-group pruning to emit a cache-aware scan plan with only the required splits.

  • Row-group metadata caching
  • ZoneMap / dictionary reuse
  • Predicate-aware pruning
  • Cache-aware split tagging
Step 303

Data Pushdown Cache (ACache Data) — Federated Read

At execution, cached splits run selective decode and vectorized filtering; cache hits return via Arrow Flight, while cache misses stream directly from the object store, preserving native scan semantics.

  • Selective decode
  • Encoding-aware filtering
  • Arrow Flight data streaming
  • Federated read across cache & OS
ENGINEQuery entry
Iceberg REST API
Cached Metadata
TCACHECatalog Proxy
SnapshotsSchemasManifestsPartitions
Cache Miss
Raw Metadata
UPSTREAMCatalog
PolarisGlueHMS
Cache Hit
Cache Refresh

Advanced Capabilities

Laketap isn't just a cache; it's a smart layer that learns from your data.

Enterprise Ready

Workload-Aware Adaptive Policies

Automatically adjusts cache eviction, retention, and selective decode thresholds based on real-time workload usage.

Metrics-Driven Access Heatmaps

Collects table/column heat, row-group access patterns, and cross-engine hotspots to dynamically optimize caching strategies.

Historical Optimization

Predicts repetitive queries, pre-warms based on daily/hourly patterns, and creates shared hotspot replicas for cross-engine workloads.

Layout Optimization

Provides recommendations for manifest layout, partition heat analysis, and compaction strategies based on historical statistics.

Design Partner Insights

Early access to new planning strategies, metadata optimizations, and pushdown extensions. Collaborate directly with the engineering team.

Experience Laketap

Deploy in your VPC with a connector swap—accelerate your Lakehouse without rewriting workloads.