KNN search things in ES V8.4.0

Hi there, I am facing some issues with knn search in es v8, hope someone can help:
env:
five es v8.4.0 nodes in a cluster
each node in a linux server(Ubuntu 16.04, 48 Logic Cores, 180GB Memory)
index mapping:(five shards, one replica)

{
  "u2i" : {
    "mappings" : {
      "properties" : {
        "item_id" : {
          "type" : "keyword"
        },
        "u2i_vector" : {
          "type" : "dense_vector",
          "dims" : 64,
          "index" : true,
          "similarity" : "cosine",
          "index_options" : {
            "type" : "hnsw",
            "m" : 32,
            "ef_construction" : 500
          }
        }
      }
    }
  }
}

Here are some issues I met:
1. When I fed 10mil data into the cluster(with bulk api in Python client, chunk size was 1k), it seemed that the requests were executed very fast, after the requests , I checked the hot threads, the refresh and flush threads took the most of CPU time(but total cpu usage of es was not very high, about 200% in each server), which indicated that it was still indexing, and here is the issue: During the indexing, I can't use knn search api to search(connection timeout, > 100s), and I am wondering if it should be, or I did sth wrong for the knn search api to be unusable?
2. When indexing is finished, knn search api can be used, but i met another issue: in the condition of incremental data input, the knn search latency(p99) becomes higher accordingly, which is reasonable, but with 300k docs increase(writing rate: 1k/s, 10mil existing), p99 increased by 20ms(from 110ms to 130ms), i am wondering that if it is normal, but personally, i think the increment of p99 may be a little bit big for this size of incremental data, so is there any optimization measure i can take?

Thanks!

Or in other word, how can i speed up the indexing job? i checked the hot threads, there were two hot threads: refresh and flush, so i thought i could enlarge the refresh and flush thread pool configs to speed up:

thread_pool.refresh.core: 48
thread_pool.refresh.max: 96
thread_pool.flush.core: 48
thread_pool.flush.max: 96

i set core threads num to the num of processors and max threads num to double num of it, but it turned out that the cpu usage changed a little(still about 100% cpu usage by the thread). Or did i misunderstand the output of hot threads command? i thought the cpu usage could be more than 100 since the processors were more than one?

anyone has any idea? thanks!

For the first issue, decreasing the value of m and ef_construction in HNSW config can speed up the indexing job and thus can decrease the unusable time. Don't worry about the accuracy when doing this, i checked the accuracy before and after, it changed a little(from 99% to 96%).

For the second issue, i tried several different configs(shards, HNSW params include m, ef_construction, similarity), it turned out that the p99 increased obviously with the incremental data input:

doc count(before query with about 1k/s writing rate ) doc count(after) QPS P99(ms) CPU(%)
10mil 10.4mil 99 272 60
11mil 11.4mil 80 324 60
20mil 20.4mil 61 422 60

so..., is it normal? Or any optimization? plz tell me if u have any idea, thanks a lot!

  For the second issue, i built a same index with the same doc count(the num after data input), it turned out that the p99 of ann search in the new index was almost the half of the old one, which was quite confusing, so i checked the segments in them, i found that the new one has only 2 segments in each shard, while the old one has much more segments in each shard, so i think it is the reason why p99 increased obviously with incremental data input(cause input data created many more segments and thus p99 increased).
  I don't know if my guess is correct or not..., if so, any action can i take now? Thanks a lot!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Hello @CGM397 , sorry for the slow response! Here's some information that can help make sense of the behavior you saw.

First, you saw that Elasticsearch indexed documents quickly, but the first search took a super long time. The reason for this is that in Elasticsearch 8.4 and before, we used to perform the most expensive indexing step (building the HNSW graph) during the "flush" or "refresh" operations. Since the first search triggers a "refresh", it takes a super long time. We fixed this behavior in Elasticsearch 8.5 (through this Lucene change: LUCENE-10592 Build HNSW Graph on indexing by mayya-sharipova · Pull Request #1043 · apache/lucene · GitHub). So now we build the HNSW graph during indexing, and the first search should be fast, as it doesn't need to block on indexing to finish.

Second, you noticed that right after indexing, the tail latency could be pretty high. This is likely because Elasticsearch has some "catch up" to do after indexing a large vector dataset. Specifically, it needs to merge together some segments to optimize the index. This is why when you waited for a bit, the search performance improved. I don't have a great suggestion for how to address that, unfortunately searches will be a bit slower until the merges complete. Here is more information about merging: Merge | Elasticsearch Guide [8.5] | Elastic.

You also might be interested in a kNN search tuning guide we just published: Tune approximate kNN search | Elasticsearch Guide [8.5] | Elastic. For example it has tips for reducing the number of index segments, to try to optimize search latency as much as possible.