Hi,
I'm running into a problem with the hybrid search ( keyword + vector ). As far as I can tell the performance bottleneck comes from the vector part for which I use the knn query.
Bellow, I'll share:
- The query I'm performing
- The indices settings, mappings, and size
- My cluster resources
- Some profiling insights from the query
- Things I tried to optimize it
I'm hoping someone will be able to give me some ideas/direction about what I'm missing and if Elastic is the right tool for my needs.
This is a sample of how my query looks:
GET index1,index2,index3,index4,index5,index6/_search
{
"size": 10,
"_source": ["includes": ["_id"]],
"retriever":
{
"rrf":
{
"retrievers": [
{
"standard":
{
"query":
{
"bool":
{
"minimum_should_match":1,
"should":
[
{
"bool":
{
"filter":
{
"terms":
{
"_index":["index1","index2","index3","index4","index5"]
}
},
"must":
{
"knn":
{
"field":"field1_vector",
"num_candidates":10,
"similarity":0.5,
"query_vector": [some_vector_of_float_numbers]
}
}
}
},
{
"bool":
{
"filter":
{
"terms":
{
"_index":["index1","index2"]
}
},
"must":
{
"knn":
{
"field":"field2_vector",
"num_candidates":10,
"similarity":0.5,
"query_vector": [some_vector_of_float_numbers]
}
}
}
}
...17 more should expressions like this
]
}
}
},
{
"standard":
{
"query":
{
"simple_query_string":
{
"fields":["field1", "field2", ...17 more],
"query":"some_full_text_query"
}
}
}
}
}
}
}
}
These are the indices details
- I have 6 indices which I wanna search into.
- The two biggest indices are around 60GB each.
- In total they have around 2.1 million documents with many text fields, but don't share exactly the same schema.
- Some indices are 10x bigger than others.
Example of index settings and mappings
{
"index1": {
"mappings": {
"dynamic": "true",
"dynamic_date_formats": [
"strict_date_optional_time",
"yyyy/MM/dd HH:mm:ss Z||yyyy/MM/dd Z"
],
"dynamic_templates": [],
"date_detection": true,
"numeric_detection": false,
"properties": {
"field1": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"field1_vector": {
"type": "dense_vector",
"dims": 384,
"index": true,
"similarity": "dot_product",
"index_options": {
"type": "int8_hnsw",
"m": 16,
"ef_construction": 100
}
},
"field2": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"field2_vector": {
"type": "dense_vector",
"dims": 384,
"index": true,
"similarity": "dot_product",
"index_options": {
"type": "int8_hnsw",
"m": 16,
"ef_construction": 100
}
}
...other fields with exactly the same config
}
},
"settings": {
"index": {
"lifecycle": {
"name": "hot_to_warm_immediate"
},
"routing": {
"allocation": {
"include": {
"_tier_preference": "data_warm,data_hot"
},
"require": {
"_tier_preference": "data_warm"
}
}
},
"number_of_shards": "1",
"provided_name": "index1",
"analysis": {
"filter": {
"english_stemmer": {
"type": "stemmer",
"language": "english"
},
"english_stop": {
"ignore_case": "true",
"type": "stop",
"stopwords": "_english_"
},
"english_possessive_stemmer": {
"type": "stemmer",
"language": "possessive_english"
}
},
"analyzer": {
"default": {
"filter": [
"lowercase",
"asciifolding",
"english_possessive_stemmer",
"english_stemmer",
"english_stop"
],
"type": "custom",
"tokenizer": "standard"
}
}
},
"priority": "50",
"number_of_replicas": "1"
}
}
}
}
Cluster resources
As you can see from the index settings I'm not using the hot tier at all. Everything goes almost directly to the warm tier.
I'm using ES cloud with 2.78 TB storage | 15 GB RAM | 1.9 vCPU, 2 zones
Profiling insights
As far as I can tell most of the time goes to vector comparison. At least I think so based on the rewrite_time metrics. I'm attaching a raw profiling session from a call that took almost 20sec.
Here's the JSON from the profiling -> https://file.io/teMSFpWyXBdG
What I tried
- Scale up the cluster - the options with 60GB RAM seem to give relatively acceptable response times, but I think it's just too much for 2 million docs.
- Reduce number of segments to 5 for the biggest indices - didn't do much in terms of performance.
- Make sure the _source is as minimal as possible - the _id will work for my use-case. So this is what you can see in the shared query. This action gave ~3x performance gain.
- It might be an option to further reduce the amount of knn queries I perform. Currently, they are around 19.
Final goal
I'll need to run this at scale, with potential thousands of searches per minute.
What else can I do to improve the performance and is Elastic even the right tool for the job?