Este artículo está disponible en Español.
Vector Search with ES|QL
Today we're unwrapping one of the most exciting additions to ES|QL: native support for dense vector fields, and the functions to search them: The KNN function and the vector similarity functions. If you've been curious about vector search but found the Query DSL syntax a bit intimidating, ES|QL is about to become your new best friend.
Why ES|QL for Vector Search?
ES|QL is the future for Elasticsearch - it allows to perform your queries using a series of processing steps. Adding vector search to ES|QL brings you expert control on how you’re performing semantic queries using vectors, by fine tuning the search method (either approximate nearest neighbors via KNN or exact search via vector similarity functions).
Setting Up Our Playground
Let's create a simple index with a dense_vector field to store some product embeddings. We'll keep it minimal—just 3 dimensions—so we can reason about the vectors easily.
PUT products-vectors
{
"mappings": {
"properties": {
"name": {
"type": "text"
},
"category": {
"type": "keyword"
},
"embedding": {
"type": "dense_vector",
"dims": 3,
"similarity": "cosine"
}
}
}
}
The key setting here is the similarity: cosine parameter (how we measure vector closeness - other options include l2_norm and dot_product).
Now let's add some sample products with their "embeddings":
POST products-vectors/_bulk
{"index": {"_id": "1"}}
{"name": "Warm Winter Jacket", "category": "clothing", "embedding": [0.9, 0.1, 0.2]}
{"index": {"_id": "2"}}
{"name": "Summer Beach Shorts", "category": "clothing", "embedding": [0.1, 0.9, 0.3]}
{"index": {"_id": "3"}}
{"name": "Cozy Wool Sweater", "category": "clothing", "embedding": [0.85, 0.15, 0.25]}
{"index": {"_id": "4"}}
{"name": "Running Sneakers", "category": "footwear", "embedding": [0.4, 0.5, 0.8]}
{"index": {"_id": "5"}}
{"name": "Hiking Boots", "category": "footwear", "embedding": [0.6, 0.3, 0.7]}
You can use ES|QL to retrieve your data, including your vector embeddings:
FROM products-vectors
In this toy example, let's say our first dimension loosely represents "warmth," the second "summer vibes," and the third "outdoor activity." Real embeddings from models like E5 or OpenAI would have hundreds of dimensions, but the principle stays the same.
Searching using the KNN Function
ES|QL's KNN function performs approximate k-nearest neighbor search on your vectors:
FROM products-vectors METADATA _score
| WHERE KNN(embedding, [0.88, 0.12, 0.22])
| KEEP name, category, _score
| SORT _score DESC
| LIMIT 3
Breaking this down:
-
METADATA _score- We need this to retrieve the scoring from the KNN function -
KNN(embedding, [0.88, 0.12, 0.22])- Find the nearest neighbors to our query vector -
The query vector
[0.88, 0.12, 0.22]represents something "warm" (high first dimension)
The result? Our warm clothing items bubble to the top:
| name | category | _score |
|---|---|---|
| Warm Winter Jacket | clothing | 0.9994338750839233 |
| Cozy Wool Sweater | clothing | 0.9992702007293701 |
| Hiking Boots | footwear | 0.9067620635032654 |
Combining KNN with Filters
One of ES|QL's superpowers is how naturally you can combine vector search with traditional filters:
FROM products-vectors
METADATA _score
| WHERE category == "clothing" AND KNN(embedding, [0.88, 0.12, 0.22])
| KEEP name, _score
| SORT _score DESC
| LIMIT 3
This applies the filtering as a pre-filter before running the KNN search - efficient and readable!
| name | category | _score |
|---|---|---|
| Warm Winter Jacket | clothing | 0.9994338750839233 |
| Cozy Wool Sweater | clothing | 0.9992702007293701 |
| Summer Beach Shorts | clothing | 0.658316433429718 |
Fine-Tuning with Optional Parameters
The KNN function accepts additional function named parameters for more control:
FROM products-vectors METADATA _score
| WHERE KNN(embedding, [0.1, 0.85, 0.3], {"k": 2, "boost": 1.5, "min_candidates": 50, "rescore_oversample": 3, "similarity": 0.0001})
| KEEP name, _score
| SORT _score DESC
The allowed parameters are:
-
k: Number of neighbors to return (implicitly from LIMIT)
-
boost: Score multiplier (default: 1.0)
-
min_candidates: Minimum candidates to consider per shard (higher = more accurate but slower)
-
similarity: The minimum similarity for considering a result
-
visit_percentage: The percentage of vectors to explore per shard while doing knn search with
bbq_disk -
rescore_oversample: Applies the specified oversample factor to
kon the approximate kNN search
Searching using vector similarity functions
KNN is excellent for searching vectors at scale, and one of the reasons for it is that it is approximate - meaning it will do its best to find good enough results, but not looking at every possible result. That allows KNN to be performant, as it doesn’t have to consider all the possible documents and compare them to the query one by one.
In case we really want to examine all results (because we have already filtered down our results, or because we don’t have many documents in the first place), we can use vector similarity functions to calculate the vector similarity between our query and every element:
FROM products-vectors
| EVAL my_score = V_COSINE(embedding, [0.1, 0.85, 0.3]) + 1.0
| KEEP name, my_score
| SORT my_score DESC
Vector similarity functions allow you to do custom scoring for vectors, and have an exact nearest neighbors calculation.
Bonus: TEXT_EMBEDDING Function
If you have an inference endpoint configured, you can generate embeddings on the fly:
FROM products-vectors
METADATA _score
| WHERE KNN(embedding, TEXT_EMBEDDING("cozy winter wear", "my-embedding-model"))
| KEEP name, _score
| SORT _score DESC
| LIMIT 3
No need to pre-compute query vectors—ES|QL handles it inline!
Wrapping Up
ES|QL's vector search capabilities bring full control for semantic search. Tweaking how you get the nearest neighbors for a query, or calculating a custom score, is possible now thanks to ES|QL dense_vector field type support, the KNN search function, and the vector similarity functions.
Whether you're building a recommendation system, a semantic search engine, or just exploring your vector data, the combination of KNN, filters, and aggregations makes ES|QL a powerful choice.
