Why We Picked Distributed DuckDB for Solana State
DuckDB is a fantastic database. It is also a single-node database. Pointing it at the full Solana account space — which crosses a hundred terabytes when you include compressed NFT proofs and Geyser-derived program state — defeats it before you start. So when we tell people the query layer in StreamSync is DuckDB, the first question is always: how?
This post walks through the answer. We picked DuckDB on purpose, we sharded it on purpose, and the result is a query layer that holds analytical SQL semantics while behaving like a distributed system. The work to get there is a fair chunk of distributed-duckdb and sharding-core in the workspace.
Why DuckDB at all
The simple version: DuckDB’s column store is the right shape for the question Solana indexer customers actually ask. They don’t say “give me this account at slot X.” They say “give me every transfer of this mint with amount over a threshold in the last 24 hours” or “give me every account owned by this program with a particular flag set.” Those are scans and aggregations over wide columnar data, and DuckDB eats them.
The other half of the answer: DuckDB has no server process to manage. It embeds inside the Rust binary that runs on each operator. We don’t have to think about “the cluster” the way we would with a distributed OLAP system that wants its own quorum, network plane, and bin-packing strategy. The shard layer is ours; the local engine is DuckDB’s.
What we give up by picking DuckDB is convenient multi-node joins. That’s the design problem this post is mostly about.
The shard layout
The sharding plan is a consistent hash ring on Solana account keys. Each operator joining the network claims one or more virtual nodes on the ring; when an operator leaves, their range is automatically reassigned. The sharding-core crate handles the ring math, rebalancing, and health checks.
What gets sharded:
- Account state is sharded by pubkey hash. Lookups by account address route to a single shard. Lookups by owner program scan across all shards in parallel.
- Transaction history is sharded by signature hash. Account-to-transaction joins use a secondary index that maps account → recent signatures; that index is replicated to a small number of nodes to keep joins cheap.
- Program-decoded state is sharded by program ID and partitioned by slot. This lets us prune time-bound queries to a fraction of the network.
The ring is reproducible from the active operator set, which means clients can predict which operator should serve a given query and the gateway can skip an extra hop when it’s confident.
Cross-shard joins
A SQL engine without cross-shard joins is a key-value store with extra steps. We needed real joins to work.
We landed on a two-stage approach. The gateway pushes the join down where it can: if both sides of the join can be evaluated on the same shard (because the join key is the shard key), the operator runs the full query locally and returns the result. This is the fast path.
Where the join crosses shards, the gateway emits a parallel sub-query plan: each operator executes its local half, returns a small intermediate result, and a designated merger node — usually the closest cache-optimizer in the region — finishes the join using DuckDB’s in-memory engine. The intermediate results are typically a few hundred kilobytes at most because operators apply local filters before sending anything. The final merge runs in 1–3ms in the common case.
That’s not as elegant as a native distributed query planner. It is a lot faster in practice, because the operators are doing the work that the planner would have offloaded anyway, and the network round trip is one fewer than a generic plan would need.
Predictive caching
The cache-optimizer node class is where the latency budget gets won. These operators run heavy memory (256GB+) and focus on holding the working set of “hot” Solana state — recent accounts, popular programs, top NFT collections by query volume. They subscribe to Geyser updates so their cache is fresh, and they answer point lookups in 1–2ms.
The interesting wrinkle is what gets cached. We let operators choose. A new operator can run a naive LRU and earn a baseline of revenue. A sophisticated operator can build a predictive model that anticipates query bursts — for example, pre-warming a popular token’s holder set when its price moves more than 5% on a major DEX. We don’t pick the strategy; the market does. Operators who pick well win more races.
Because cache decisions are local, we don’t have a cache-coherency problem in the protocol. We have a competitive-cache problem in the market. Operators who hold the wrong data lose; operators who hold the right data win. Over time the network’s effective cache converges on the actual hot set, distributed across whichever operators are most efficient at predicting it.
ZK reconstruction as a query class
Compressed accounts and compressed NFTs are a useful test case for the architecture. They are not stored as plain account data — they are leaves in a Merkle tree with proofs on chain. Reading one in a useful form requires reconstructing the account from the tree, which is a compute-bound operation.
Some indexers handle this by pre-decompressing everything and storing the decompressed result. That works until your storage cost makes the business untenable. StreamSync routes these queries to a separate node class, the ZK reconstruction specialists, whose hardware is biased toward CPU and whose pricing reflects the higher work-per-query. The customer doesn’t have to know any of this — they ask for a compressed NFT by mint, and the gateway picks the right specialist.
The lesson here is the same as caching: don’t pick the strategy, pick the structure that lets operators choose.
What the workspace looks like
The relevant crates in the repo are:
distributed-duckdb(34 tests): query planner, sub-query distribution, result mergingsharding-core(60 tests): hash ring, rebalancing, health, secondary indexesnetworking-core(45 tests): gossip protocol, peer discovery, NNG messaging primitivesidl-sync(18 tests): behavioral IDL generation for program-level decodingzk-reconstruction(8 tests): compressed-account reconstruction for the ZK specialistssolana-indexerandprogram-parser: RPC ingest and SPL/Metaplex decoding into the DuckDB-friendly schema
All of these compile under one workspace, and the integration tests run the full pipeline end-to-end on a synthetic cluster of operators.
What we’d change
A year in, the things we’d do differently are mostly about the rebalancing path. Operator churn is more frequent than we expected — not because operators are unreliable, but because they switch specializations to chase higher-margin query types. Rebalancing a 200GB shard takes long enough that we are now planning a “warm handover” mode where the outgoing operator continues serving until the incoming operator catches up. That work is in flight.
We would also do more with adaptive sub-query plans. Right now the planner picks a strategy once per query; on bigger joins it would be nice to revise mid-flight based on the size of the intermediate results we are seeing. That’s a 2027 problem.
The architecture has earned its keep. DuckDB is the right local engine. The shard layer is the right glue. The result is a query surface that handles the analytical workload Solana indexer customers actually run, at the latency they actually need.