Abstract
We present AresaDB, an embedded multi-model database engine written in Rust that unifies key-value, graph, relational (SQL), vector similarity search, and full-text search under a single property graph data model. AresaDB introduces a transparent cloud-tiered storage architecture that keeps lightweight graph index records local for sub-microsecond traversals while allowing full node payloads to be seamlessly offloaded to cloud object storage (S3/GCS).
We describe the system's architecture, indexing strategies, and query engine, and present experimental results demonstrating (50K nodes, 250K edges, 10K × 128-D vectors on Apple M2 Pro):
- Batch insert throughput above 75,000 nodes/sec (≈260× faster than per-row transactions)
- Point lookups at ≈5 µs p50 latency (p99 ≤ 15 µs)
- Index-only graph hops at sub-microsecond latency
- Graph traversals completing in under 300 µs for 3-hop BFS queries on a 50K-node graph
- HNSW vector search at ≈7 µs (roughly 100× faster than brute force)
- Secondary index speedups above 20× over full scans
- BM25 full-text search over 12,500 documents in ≈30 ms
AresaDB is open-source under the MIT license and available as a Rust crate, Python package, and Docker image. Every performance claim above is reproducible with a single command — uv run python experiments/run.py — which wraps the Rust benchmark suite and emits structured JSON consumed by the figures in §7.
Keywords: Multi-model database, embedded database, property graph, cloud-tiered storage, vector search, HNSW, full-text search, BM25, Rust
Key Takeaways
- Modern applications need multiple data paradigms (KV, graph, SQL, vector, full-text) but deploying five separate databases is operationally untenable.
- AresaDB's tiered storage separates graph index from payload, enabling sub-microsecond traversal regardless of where payloads reside.
- A single embedded Rust binary replaces the need for PostgreSQL + Neo4j + Pinecone + Elasticsearch.
- Every number in this report is reproducible with a single command:
uv run python experiments/run.py. The script wrapscargo run –example benchmark_suite –release, canonicalises its JSON output, and refreshes bothdata/benchmark_results.json(consumed by the figures in §7) andexperiments/results/metrics.json(the archived per-run record).
Introduction
Background and Motivation
The proliferation of application data models — relational tables, document stores, graph networks, vector embeddings, and full-text corpora — has led to an increasingly fragmented database landscape. Developers commonly deploy multiple specialized systems (e.g., PostgreSQL for relational data, Neo4j for graphs, Pinecone for vectors, Elasticsearch for full-text) and bear the operational complexity of synchronizing data across them.
Embedded databases like SQLite and DuckDB have demonstrated that many workloads can be served by a single in-process engine, eliminating network latency and operational overhead. However, existing embedded databases are single-model: SQLite excels at relational queries but lacks native graph traversal and vector search; DuckDB targets analytics but not OLTP or graph workloads; LanceDB provides vector search but not graph or full-text capabilities.
AresaDB
We present AresaDB, an embedded multi-model database that unifies five query paradigms under a single property graph foundation:
- Key-Value: Direct
NodeId → Nodelookups at ≈5 µs p50 latency - Graph: BFS/DFS traversal with sub-microsecond index-only hops
- Relational: SQL queries with secondary B-tree indexes (>20× speedup over full scan)
- Vector Search: HNSW approximate nearest neighbor (≈7 µs, ≈100× faster than brute force)
- Full-Text Search: Inverted index with BM25 ranking (≈30 ms over 12.5K documents)
AresaDB's key architectural contribution is transparent cloud tiering: a split storage design where lightweight graph index records (~200 bytes) remain on local storage for sub-microsecond access, while full node payloads (properties, embeddings — kilobytes to megabytes) can be transparently offloaded to S3 or GCS. This enables graphs with millions of relationships to be traversed at memory-like speeds while the actual data scales to cloud-level capacity.
The system is implemented in Rust for memory safety and performance, distributed as a single binary with zero external dependencies, and provides access through a CLI, interactive REPL, Rust library, Python bindings (PyO3), and a TCP wire protocol.
Contributions
This paper makes the following contributions:
- A multi-model data architecture that maps five query paradigms onto a unified property graph (@sec-data-model)
- A transparent cloud-tiered storage engine that separates index structure from payloads with automatic eviction and cache management (@sec-tiered-storage)
- An integrated index subsystem combining B-tree secondary indexes, inverted full-text indexes, and HNSW vector indexes in a single storage engine (@sec-index-subsystem)
- A cost-based query engine with SQL parsing, index-aware planning, and support for filtered vector search (@sec-query-engine)
- Reproducible experimental evaluation demonstrating competitive performance across all five query paradigms (@sec-evaluation)