AresaDB Technical Report

Abstract

We present AresaDB, an embedded multi-model database engine written in Rust that unifies key-value, graph, relational (SQL), vector similarity search, and full-text search under a single property graph data model. AresaDB introduces a transparent cloud-tiered storage architecture that keeps lightweight graph index records local for sub-microsecond traversals while allowing full node payloads to be seamlessly offloaded to cloud object storage (S3/GCS).

We describe the system's architecture, indexing strategies, and query engine, and present experimental results demonstrating (50K nodes, 250K edges, 10K × 128-D vectors on Apple M2 Pro):

Batch insert throughput above 75,000 nodes/sec (≈260× faster than per-row transactions)
Point lookups at ≈5 µs p50 latency (p99 ≤ 15 µs)
Index-only graph hops at sub-microsecond latency
Graph traversals completing in under 300 µs for 3-hop BFS queries on a 50K-node graph
HNSW vector search at ≈7 µs (roughly 100× faster than brute force)
Secondary index speedups above 20× over full scans
BM25 full-text search over 12,500 documents in ≈30 ms

AresaDB is open-source under the MIT license and available as a Rust crate, Python package, and Docker image. Every performance claim above is reproducible with a single command — uv run python experiments/run.py — which wraps the Rust benchmark suite and emits structured JSON consumed by the figures in §7.

Keywords: Multi-model database, embedded database, property graph, cloud-tiered storage, vector search, HNSW, full-text search, BM25, Rust

Key Takeaways

Modern applications need multiple data paradigms (KV, graph, SQL, vector, full-text) but deploying five separate databases is operationally untenable.
AresaDB's tiered storage separates graph index from payload, enabling sub-microsecond traversal regardless of where payloads reside.
A single embedded Rust binary replaces the need for PostgreSQL + Neo4j + Pinecone + Elasticsearch.
Every number in this report is reproducible with a single command: uv run python experiments/run.py. The script wraps cargo run –example benchmark_suite –release, canonicalises its JSON output, and refreshes both data/benchmark_results.json (consumed by the figures in §7) and experiments/results/metrics.json (the archived per-run record).

Introduction

Background and Motivation

The proliferation of application data models — relational tables, document stores, graph networks, vector embeddings, and full-text corpora — has led to an increasingly fragmented database landscape. Developers commonly deploy multiple specialized systems (e.g., PostgreSQL for relational data, Neo4j for graphs, Pinecone for vectors, Elasticsearch for full-text) and bear the operational complexity of synchronizing data across them.

Embedded databases like SQLite and DuckDB have demonstrated that many workloads can be served by a single in-process engine, eliminating network latency and operational overhead. However, existing embedded databases are single-model: SQLite excels at relational queries but lacks native graph traversal and vector search; DuckDB targets analytics but not OLTP or graph workloads; LanceDB provides vector search but not graph or full-text capabilities.

AresaDB

We present AresaDB, an embedded multi-model database that unifies five query paradigms under a single property graph foundation:

Key-Value: Direct NodeId → Node lookups at ≈5 µs p50 latency
Graph: BFS/DFS traversal with sub-microsecond index-only hops
Relational: SQL queries with secondary B-tree indexes (>20× speedup over full scan)
Vector Search: HNSW approximate nearest neighbor (≈7 µs, ≈100× faster than brute force)
Full-Text Search: Inverted index with BM25 ranking (≈30 ms over 12.5K documents)

AresaDB's key architectural contribution is transparent cloud tiering: a split storage design where lightweight graph index records (~200 bytes) remain on local storage for sub-microsecond access, while full node payloads (properties, embeddings — kilobytes to megabytes) can be transparently offloaded to S3 or GCS. This enables graphs with millions of relationships to be traversed at memory-like speeds while the actual data scales to cloud-level capacity.

The system is implemented in Rust for memory safety and performance, distributed as a single binary with zero external dependencies, and provides access through a CLI, interactive REPL, Rust library, Python bindings (PyO3), and a TCP wire protocol.

Contributions

This paper makes the following contributions:

A multi-model data architecture that maps five query paradigms onto a unified property graph (@sec-data-model)
A transparent cloud-tiered storage engine that separates index structure from payloads with automatic eviction and cache management (@sec-tiered-storage)
An integrated index subsystem combining B-tree secondary indexes, inverted full-text indexes, and HNSW vector indexes in a single storage engine (@sec-index-subsystem)
A cost-based query engine with SQL parsing, index-aware planning, and support for filtered vector search (@sec-query-engine)
Reproducible experimental evaluation demonstrating competitive performance across all five query paradigms (@sec-evaluation)

AresaDB: A High-Performance Multi-Model Database in Rust