ESC
Type to search...
S
Soli Docs

Semantic Search with .similar() in Soli

Building a search that understands meaning — not just keywords — used to require a dedicated search service, a separate indexing pipeline, and a lot of infrastructure. Soli's .similar() method changes that by making vector similarity search a first-class database primitive.

With a single method chain you can rank any database query by semantic relevance:

results = Product
    .where("category == 'electronics'")
    .similar("wireless noise-cancelling headphones under 200")
    .all

for product in results
    print(product.name + " — " + str(product._similarity_score))
end

Each result gets a _similarity_score field (0.0 to 1.0) so you can surface relevance to your users.

How It Works

The .similar() method is a QueryBuilder chain method. When you call it:

  1. Embedding generation — The query text is sent to an OpenAI-compatible API to produce a vector embedding
  2. Document fetch — Matching documents are fetched from SolidB (all existing filters, joins, and conditions apply)
  3. Cosine similarity — Each document's embedding field is compared against the query embedding using cosine similarity
  4. Ranking — Results are sorted by similarity (highest first), trimmed to top-K, and returned with a _similarity_score on each record

SolidDB underpins this with native vector search capabilities — HNSW indexes, VECTOR_SIMILARITY() in SDBQL, scalar quantization, and a dedicated REST API. The current Soli runtime computes similarity client-side, but the same SolidDB engine that stores your data is ready for native vector workloads at scale.

Configuration

Set these environment variables:

VariableDefaultRequired
SOLI_EMBEDDING_API_KEYYes
SOLI_EMBEDDING_URLhttps://api.openai.com/v1/embeddingsNo
SOLI_EMBEDDING_MODELtext-embedding-3-smallNo

When the API key is not set, .similar() returns an empty result set.

API

.similar(query_text, field?, top_k?)
ParameterTypeDefaultDescription
query_textStringThe natural-language search query
fieldString"embedding"The document field containing the embedding vector
top_kInt10Maximum number of results to return

Basic Usage

Simple semantic search

Find the 10 most semantically similar posts, using the default embedding field:

results = Post
    .where("published == true")
    .similar("how to deploy a web app")
    .all

for post in results
    print(post.title + " (score: " + str(post._similarity_score) + ")")
end

Custom embedding field and top-K

If your model stores embeddings in a different field, or you need more (or fewer) results:

results = Product
    .where("active == true")
    .similar("red running shoes", "title_embedding", 5)
    .all

results.each(fn(p)
    print(p.name + " — " + str(p._similarity_score))
end)

Combining with other chain methods

.similar() composes with every other QueryBuilder method — .where(), .order(), .includes(), .limit(), and so on:

results = Product
    .includes("reviews")
    .where("price <= @max", {"max": 100})
    .where("category == 'footwear'")
    .similar("comfortable hiking boots", "description_embedding", 20)
    .order("price", "asc")
    .all

Tutorial: Product Recommendation Engine

Let's build a complete product search endpoint that combines traditional filters with semantic relevance. We'll seed products, auto-generate embeddings, and serve ranked results.

Step 1: Define the model

# app/models/product.sl
class Product extends Model {
    # Embedding field stores the vector for similarity search
    # It's populated automatically when the product is created
}

Step 2: Create the migration

Generate and write the migration to create the products collection:

soli db:migrate generate create_products
# db/migrations/20260101000000_create_products.sl

fn up(db: Any)
    db.create_collection("products")

    # Create a vector index on the embedding field for HNSW similarity search
    db.create_vector_index("products", "embedding_idx", "embedding", 1536)
end

fn down(db: Any)
    db.drop_vector_index("products", "embedding_idx")
    db.drop_collection("products")
end

Apply it:

soli db:migrate

SolidDB uses HNSW (Hierarchical Navigable Small World) graphs for fast approximate nearest-neighbor search. With a vector index, search becomes O(log n) instead of O(n) with ~95%+ recall. You can also enable scalar quantization by adding "quantization": "scalar" to reduce memory 4x.

Note: The current .similar() implementation computes similarity client-side in Rust. The vector index is available for when you use SolidDB's native VECTOR_SIMILARITY() in raw AQL queries or the dedicated vector search REST API.

Step 3: Seed the data with embeddings

Populate products and generate embeddings via the OpenAI-compatible API:

# db/seeds.sl

fn generate_embedding(text)
    let api_key = getenv("SOLI_EMBEDDING_API_KEY")
    let url = getenv("SOLI_EMBEDDING_URL") rescue "https://api.openai.com/v1/embeddings"
    let model = getenv("SOLI_EMBEDDING_MODEL") rescue "text-embedding-3-small"

    let response = http_post(url, {
        "headers": {
            "Authorization": "Bearer " + api_key,
            "Content-Type": "application/json"
        },
        "body": json_stringify({
            "input": text,
            "model": model
        })
    })

    let body = json_parse(response["body"])
    return body["data"][0]["embedding"]
end

# Seed products
let products = [
    {"name": "Ultra Comfort Running Shoes", "description": "Lightweight mesh running shoes with cushioned sole for long-distance runners", "category": "footwear", "price": 129.99},
    {"name": "Trail Blazer Hiking Boots", "description": "Waterproof leather hiking boots with reinforced toe and ankle support", "category": "footwear", "price": 189.99},
    {"name": "Wireless Noise-Cancelling Headphones", "description": "Over-ear headphones with active noise cancellation and 30-hour battery", "category": "electronics", "price": 249.99},
    {"name": "Smart Fitness Watch", "description": "Water-resistant fitness tracker with heart rate monitor and GPS", "category": "electronics", "price": 199.99},
    {"name": "Organic Cotton T-Shirt", "description": "Soft organic cotton crew-neck t-shirt available in 12 colors", "category": "clothing", "price": 34.99},
    {"name": "Merino Wool Sweater", "description": "Lightweight merino wool sweater perfect for layering in cold weather", "category": "clothing", "price": 89.99},
    {"name": "Portable Bluetooth Speaker", "description": "Rugged waterproof speaker with 20-hour battery and deep bass", "category": "electronics", "price": 79.99},
    {"name": "Yoga Mat Premium", "description": "Extra-thick non-slip yoga mat with carrying strap", "category": "fitness", "price": 49.99}
]

for p in products
    let combined_text = p["name"] + ". " + p["description"] + ". Category: " + p["category"]
    let embedding = generate_embedding(combined_text)

    Product.create({
        "name": p["name"],
        "description": p["description"],
        "category": p["category"],
        "price": p["price"],
        "embedding": embedding
    })

    print("Created: " + p["name"])
end

Step 4: Build the search controller

# app/controllers/products_controller.sl

fn search(req)
    let query = req["params"]["q"] || req["json"]["query"]
    let category = req["params"]["category"]
    let max_price = req["params"]["max_price"]
    let top_k = int(req["params"]["top_k"] rescue "10")

    if !query || query == ""
        return json_response({"error": "query parameter is required"}, 422)
    end

    # Build the query chain dynamically
    let q = Product.where("true == true")

    if category && category != ""
        q = q.where("category == @cat", {"cat": category})
    end

    if max_price && max_price != ""
        q = q.where("price <= @max", {"max": float(max_price)})
    end

    # Add semantic search — uses the default "embedding" field
    q = q.similar(query, "embedding", top_k)

    let results = q.all

    let output = results.map(fn(p)
        return {
            "name": p.name,
            "description": p.description,
            "category": p.category,
            "price": p.price,
            "score": p._similarity_score
        }
    end)

    return json_response({
        "query": query,
        "count": len(output),
        "results": output
    })
end

Step 5: Wire up the route

# config/routes.sl
get("/products/search", "products#search")

Step 6: Try it out

curl "http://localhost:5011/products/search?q=comfortable+footwear+for+running&max_price=150"

Response:

{
  "query": "comfortable footwear for running",
  "count": 3,
  "results": [
    {
      "name": "Ultra Comfort Running Shoes",
      "description": "Lightweight mesh running shoes with cushioned sole for long-distance runners",
      "category": "footwear",
      "price": 129.99,
      "score": 0.924
    },
    {
      "name": "Trail Blazer Hiking Boots",
      "description": "Waterproof leather hiking boots with reinforced toe and ankle support",
      "category": "footwear",
      "price": 189.99,
      "score": 0.781
    },
    {
      "name": "Merino Wool Sweater",
      "description": "Lightweight merino wool sweater perfect for layering in cold weather",
      "category": "clothing",
      "price": 89.99,
      "score": 0.412
    }
  ]
}

Results are ranked by semantic relevance. The running shoes match best, the hiking boots also rank highly (both are footwear), and the sweater scores lower because it's a different category — but still semantically related to "comfortable."

Step 7: Surface scores in the UI

In your ERB template, show the relevance score:

<% for product in results %>
    <div class="product-card">
        <h3><%= h(product["name"]) %></h3>
        <p><%= h(product["description"]) %></p>
        <span class="price">$<%= product["price"] %></span>
        <% if product["score"] %>
            <span class="badge">Relevance: <%= round(product["score"] * 100) %>%</span>
        <% end %>
    </div>
<% end %>

Combining with Eager Loading

.similar() works alongside .includes() so you can avoid N+1 queries when displaying related data:

results = Product
    .includes("reviews", "category")
    .similar("durable outdoor gear", "description_embedding", 20)
    .all

for product in results
    print(product.name + " (" + product.reviews.length + " reviews)")
end

Native Vector Search with SolidDB

Beyond .similar(), SolidDB provides a full vector search engine that you can use directly for production-scale workloads:

SDBQL Vector Functions

Use VECTOR_SIMILARITY() in raw AQL queries to push similarity computation to the database:

FOR doc IN products
  LET sim = VECTOR_SIMILARITY(doc.embedding, @query_vec)
  FILTER sim > 0.7
  FILTER doc.category == "electronics"
  SORT sim DESC
  LIMIT 20
  RETURN MERGE(doc, { "_similarity_score": sim })

REST Vector Search API

Query a vector index directly without writing SDBQL:

curl -X POST \
  http://localhost:6745/_api/database/solidb/vector/products/embedding_idx/search \
  -H "Content-Type: application/json" \
  -d '{
    "vector": [0.0123, -0.0456, ...],
    "limit": 10,
    "ef_search": 100
  }'

Hybrid Search

Combine vector similarity with fulltext search for 15-30% better relevance in RAG applications:

LET results = HYBRID_SEARCH(
    "products",
    "embedding_idx",
    "description",
    @query_vector,
    "comfortable hiking boots",
    { vector_weight: 0.6, text_weight: 0.4, limit: 20 }
)
FOR r IN results
  RETURN { name: r.doc.name, score: r.score, sources: r.sources }

See the SolidDB Vector Search docs and Hybrid Search docs for full details.

Performance Considerations

  • The embedding API call adds latency proportional to the query text length (typically 200–500ms for OpenAI's text-embedding-3-small)
  • The current .similar() implementation computes cosine similarity client-side in Rust across all matching documents — this is fast (microseconds per document) but for very large result sets (100k+ documents), consider adding filters to narrow candidates first or switching to SolidDB's native VECTOR_SIMILARITY() via raw AQL
  • With a vector index and native SolidDB vector search, search becomes O(log n) instead of O(n) using HNSW graphs (~95%+ recall)
  • The ef_search parameter (default 40) tunes the speed-recall tradeoff at query time

Summary

PatternDescription
.where(...).similar("text").allFiltered semantic search (default embedding field, top 10)
.where(...).similar("text", "field", N).allCustom embedding field and result count
._similarity_scoreFloat (0.0–1.0) injected into each result record

Vector search doesn't have to mean adding Elasticsearch, Pinecone, or a separate AI pipeline. With Soli's .similar() method, it's just another link in the query chain — no different from .where() or .order(). Configure your embedding API key and you're ready to ship semantic search.