Indexing performance on indices with vector fields

We are using Elasticsearch version 8.11.3 self hosted, in a cluster with 11 nodes (16 gb ram 16 cpu each).
We have an index with ~140k documents that contain fields of various types (mostly keywords and a few text ones) and 3 vector fields. The index has 5 shards and 9 replicas - tuned for query throughput and response time.
All queries currently use only the keyword and text fields. The vectors are not yet used in queries.
The workload is mainly query, but there is a fair amount of indexing - about 1k RPS for searches and ~200 RPS for doc updates/adds.

Now, the issue is that we are indexing updates on documents, but only on the non vector fields. We are seeing way slower indexing (and querying) throughput if the index contains the vectors as opposed to updates on docs if the index is scraped of the vector fields.

Question is, does ES recompute KNN trees even if some random non-vector field gets updated in the index? If so, is there any way to stop this ?

would splitting the indices in two, one for vector search, one for the rest of the fields somewhat fix the issue ? This would keep the fields updates in the main index while having minimal updates on the vectors one.

Hi there @Sorin_Panduru,

If anything in a document gets updated, the document and all of its fields will be indexed. This includes the HNSW data - as we do segment based indexing, we'll write to new segments which will start a new HNSW graph. Then when merges happen the HNSW graphs will also be rewritten on merge. You can find more information about this rationale in this search labs blog post on vector search rationale.

RE: creating a different search index, it really depends on what you're indexing and using it for. But keep in mind that will cause more complexities getting the right documents at search time.