Forcemerge of index with vectors takes long, many hours or forever

I have two big problems where we try to index >100mln ob documents with vector field:

  1. big problems with forcemerge in indices with dense_vector field inside it. Sometimes it takes to forcmerge from 15-20 segments to 5 few hours and sometime it takes more than a day, segments number is not changed. And as far as forcmerge is not cancellable task then only one way I'd found - restart nodes (we use ECK). After restarting it begin to forcemerge once more.
  2. indexing takes much time. 5000 documents are indexed in 4.1.-5 minutes.

Bellow are examples of indexing and forcemerge time, our index and nodes settings and different ways I tried already to solve the problems. If you know any other way to help - please, suggest. I've already used almost everything I could imagine and found in documentation and google, for example in these links:

Translog | Elasticsearch Guide [8.13] | Elastic.
etc.

Indices and indexing speed

for all indexing refresh_interval is set to -1.
in all nodes these setting was increased for test (works the same on default and this) indices.memory.index_buffer_size: 50%
Indices are not queried now (can be queried a few times a day just for testing purposes).

index state when indexing: 1.2 average segments, 9,901 docs, 5 shards.
indexed 5000 docs in 3.7 minutes.

another index state: 2,755,439 and 9.3. average segments, 5 shards.
The indexing speed is 5000 documents in 4.4 minutes.

The same on the index with 7 mln of documents (it is ~ 100gb of primary + replica size)

Forcemerge time

Some indices are merged on average in 1-3 hours from 20-30 segments to 5 (or sometimes 1).
But many times it goes badly in the way that it just runs forever and restarting nodes helps.
I took some statistics, here are examples:

time    avg_segments forcemerge run_time docs_count
14:58       22.7      running    3.7h    7,791,504
16:32       27.0      running    5.2h    7,791,555
19:13       31.8      running     8h     7,791,627
then I restarted nodes and the second index finished forcemerge but not this one

19:15       38.8      running    1.5m    7,791,627
19:38       31.8      running    24.9m   7,791,627
20:56       31.8      running    1.7h    7,791,627
21:52       32.5      running    2.6h    7,792,382
22:06       33.5      running    2.8h    7,793,022
22:57       48.0      running    3.7h    7,793,719

and this is only one example, I had such cases for many times almost every day during the last two weeks when I started this indexing.

Nodes config

We use Elasticsearch 8.6.2.
Index mapping for vector field is this:

"mappings": {
      "_source": {
        "excludes": [
          "embedding"
        ]
      },
      "properties": {
        ...
        "embedding": {
          "type": "dense_vector",
          "dims": 1024,
          "index": true,
          "similarity": "cosine",
          "index_options": {
            "type": "hnsw",
            "m": 48,
            "ef_construction": 256
          }
        },
        ...

We have clusters with different node tiers. On some tiers, there are regular indices without vectors inside and everything is ok with force merge,
Another node tier with indices with vectors (512 dimensions) is also good.
But the last few tiers with an index with 1024 dimensions do force merge very badly.

So we have:
5 nodes for hot indices (for faster search) - the most recent data. n2-highmem-4
2 nodes for indexing (and forcemerge) n2-highmem-4
3 nodes for warm indices. n1-standard-8
Add nodes has SSD disk and enough other recourse, in stach monitor I see good charts, no problems with resources.
I've increased merge thread_pool on indexing and warm nodes to 4 and on hot to 2.

And sometimes force merge of indices with vectors on any of the last described 3 tiers takes long, many hours or forever. I've been playing with this for the last few weeks and tried many different approaches, and nothing helped much.

Taken Actions

I played with different params:

  1. I tried to index data in parallel in a few different indices - no effect.
  2. replicas number set to 0 - time decreased from 4.4. minutes for 5k docs to 4-4.1. minutes but it is still not enough. It is dangerous to have replica 0 because there were already cases when under big load it is
  3. Changed index.translog.sync_interval to 30 seconds and 600 seconds and index.translog.durability to async - no effect.

If I need to index 200 mln documents it would be 2933 hours (200 * 10**6 * 4.4 / 5000 / 60) or 122 days. Plus I need to stop indexing from time to time to make force merge and wait - it can be another 30-50% to time. If to do this in parallel it can be 12 days in 10 parallel runs if speed will remain the same.

Is there any other possible way to increase the speed of indexing?

I played a bit more and logged some data.

Forcemerge can't be canceled, so when it went to forever I restarted all nodes on which it was run and then changed the strategy to this. First I forcmerge from 40 to 30, then to 20, then to 10, 5 and then to 2. All but the last were conducted in 20-40 minutes each, but the last one (from 5 to 2) -a few hours.

Then I started to index once more and stopped to forcemerge to 1 segment, It went to forever once more - here is a log (sometimes you can see that the number of docs is slightly changed, but it is very small indexing, that starts and then tops when it discover that forcemerge is in progress)

date                 avg_segments run_time      docs_count          size         index_nodes            task_id
------------------------------------------------------------------------------------------------------------------------
2024-04-09 23:23:05		   14.4	  24.7m	      7,806,633 docs	   191.71 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-09 23:30:01		   14.4	  31.6m	      7,806,633 docs	   191.76 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-09 23:37:19		   14.4	  38.9m	      7,806,633 docs	   191.76 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-09 23:44:34		   14.4	  46.2m	      7,806,633 docs	   191.76 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-09 23:51:42		   15.4	  53.3m	      7,806,654 docs	   191.76 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-09 23:58:51		   15.4	     1h	      7,806,654 docs	   191.76 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 00:06:02		   16.4	   1.1h	      7,806,698 docs	   191.76 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 00:13:10		   16.4	   1.2h	      7,806,698 docs	   191.76 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 00:20:21		   17.0	   1.3h	      7,806,702 docs	   191.76 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 00:27:32		   17.0	   1.4h	      7,806,702 docs	   191.76 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 00:34:43		   17.8	   1.6h	      7,806,710 docs	   191.76 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 00:41:53		   17.8	   1.7h	      7,806,710 docs	   191.76 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 00:49:01		   18.6	   1.8h	      7,806,791 docs	   191.76 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 00:56:14		   18.8	   1.9h	      7,806,791 docs	   191.76 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 01:03:29		   19.8	     2h	      7,806,927 docs	   191.77 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 01:10:45		   19.8	   2.2h	      7,806,927 docs	   191.77 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 01:17:57		   20.7	   2.3h	      7,810,639 docs	   191.97 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 01:25:09		   20.8	   2.4h	      7,810,639 docs	   191.97 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 01:32:30		   21.8	   2.5h	      7,813,686 docs	   192.18 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 01:39:41		   21.8	   2.6h	      7,813,686 docs	   192.18 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 01:46:55		   22.7	   2.8h	      7,818,686 docs	   192.37 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 01:53:52		   22.8	   2.9h	      7,818,686 docs	   192.38 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 02:00:51		   23.5	     3h	      7,820,273 docs	   192.44 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 02:07:49		   23.8	   3.1h	      7,820,273 docs	   192.44 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 02:14:43		   23.8	   3.2h	      7,820,273 docs	   192.44 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 02:21:38		   24.8	   3.3h	      7,825,273 docs	   192.65 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 02:28:38		   24.8	   3.5h	      7,825,273 docs	   192.65 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 02:35:33		   25.8	   3.6h	      7,830,273 docs	   192.85 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 02:42:31		   25.8	   3.7h	      7,830,273 docs	   192.85 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 02:49:29		   26.8	   3.8h	      7,835,273 docs	   193.05 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 02:56:23		   26.8	   3.9h	      7,835,273 docs	   193.05 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 03:03:24		   26.8	     4h	      7,835,273 docs	   193.05 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 03:10:24		   26.8	   4.2h	      7,835,273 docs	   193.05 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 03:17:21		   28.8	   4.3h	      7,835,504 docs	   193.06 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 03:24:26		   28.8	   4.4h	      7,835,504 docs	   193.06 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 03:31:24		   29.3	   4.5h	      7,840,504 docs	   193.24 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 03:38:21		   29.8	   4.6h	      7,840,504 docs	   193.26 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 03:45:23		   29.8	   4.7h	      7,840,504 docs	   193.29 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 03:52:22		   30.8	   4.9h	      7,845,504 docs	   193.47 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 03:59:18		   30.8	     5h	      7,845,504 docs	   193.47 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 04:06:17		   31.8	   5.1h	      7,850,504 docs	   193.67 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 04:13:13		   31.8	   5.2h	      7,850,504 docs	   193.67 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 04:20:07		   32.8	   5.3h	      7,850,569 docs	   193.67 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 04:27:03		   32.8	   5.4h	      7,850,569 docs	   193.67 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 04:34:00		   33.8	   5.5h	      7,850,661 docs	   193.72 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 04:40:54		   32.9	   5.7h	      7,850,661 docs	   193.52 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 04:47:52		   33.9	   5.8h	      7,855,661 docs	   193.72 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 04:54:49		   33.9	   5.9h	      7,855,661 docs	   193.72 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 05:01:51		   33.9	     6h	      7,855,661 docs	   193.72 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 05:08:55		   33.9	   6.1h	      7,855,661 docs	   193.72 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 05:15:52		   34.4	   6.2h	      7,861,114 docs	   193.98 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 05:22:40		   37.3	   6.4h	      7,862,614 docs	   193.98 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 05:29:28		   37.3	   6.5h	      7,862,614 docs	   193.98 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 05:36:10		   38.3	   6.6h	      7,862,629 docs	   193.98 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 05:42:49		   38.3	   6.7h	      7,862,629 docs	   193.98 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 05:49:28		   39.3	   6.8h	      7,867,629 docs	   194.18 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 05:56:06		   39.3	   6.9h	      7,867,629 docs	   194.18 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 06:02:45		   40.3	     7h	      7,868,001 docs	   194.20 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 06:09:22		   40.3	   7.1h	      7,868,001 docs	   194.20 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 06:16:05		   40.4	   7.2h	      7,868,059 docs	   194.20 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 06:22:43		   40.4	   7.4h	      7,868,059 docs	   194.20 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 06:29:24		   40.4	   7.5h	      7,868,059 docs	   194.20 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 06:36:04		   41.4	   7.6h	      7,873,059 docs	   194.40 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 06:42:44		   41.4	   7.7h	      7,873,059 docs	   194.40 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 06:49:22		   42.4	   7.8h	      7,878,059 docs	   194.60 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 06:56:03		   42.4	   7.9h	      7,878,059 docs	   194.60 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 07:02:53		   42.4	     8h	      7,878,059 docs	   194.60 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 07:09:39		   42.4	   8.1h	      7,878,059 docs	   194.60 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 07:16:25		   41.6	   8.3h	      7,883,059 docs	   194.81 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 07:23:14		   41.0	   8.4h	      7,883,809 docs	   194.97 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 07:30:02		   41.0	   8.5h	      7,883,809 docs	   194.97 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 07:36:46		   40.2	   8.6h	      7,888,809 docs	   195.15 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 07:43:27		   40.2	   8.7h	      7,888,809 docs	   195.15 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 07:50:06		   41.0	   8.8h	      7,888,813 docs	   195.15 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 07:56:45		   41.0	   8.9h	      7,888,813 docs	   195.15 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 08:03:27		   42.0	     9h	      7,893,813 docs	   195.35 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 08:10:08		   42.0	   9.1h	      7,893,813 docs	   195.35 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 08:16:52		   41.2	   9.3h	      7,898,813 docs	   195.56 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 08:23:39		   41.2	   9.4h	      7,898,813 docs	   195.53 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 08:30:18		   41.2	   9.5h	      7,898,813 docs	   195.53 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 08:36:59		   42.2	   9.6h	      7,903,063 docs	   195.73 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 08:43:40		   42.2	   9.7h	      7,903,063 docs	   195.73 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 08:50:23		   42.3	   9.8h	      7,903,216 docs	   195.74 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 08:57:04		   42.3	   9.9h	      7,903,216 docs	   195.73 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 09:03:42		   42.3	    10h	      7,903,216 docs	   195.73 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 09:10:22		   42.3	  10.2h	      7,903,216 docs	   195.73 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 09:17:13		   42.9	  10.3h	      7,913,216 docs	   196.41 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 09:23:53		   41.1	  10.4h	      7,913,216 docs	   195.76 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 09:30:33		   41.1	  10.5h	      7,913,216 docs	   195.77 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 09:37:10		   41.2	  10.6h	      7,915,130 docs	   195.83 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 09:43:47		   41.2	  10.7h	      7,915,130 docs	   195.83 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 09:50:28		   41.3	  10.8h	      7,920,130 docs	   196.03 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 09:57:05		   41.3	  10.9h	      7,920,130 docs	   196.02 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 10:03:44		   41.4	    11h	      7,925,130 docs	   196.27 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 10:10:23		   41.4	  11.2h	      7,925,130 docs	   196.09 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 10:17:02		   42.4	  11.3h	      7,930,130 docs	   196.37 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 10:23:40		   41.5	  11.4h	      7,930,130 docs	   196.13 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 10:30:15		   41.5	  11.5h	      7,930,130 docs	   196.13 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 10:36:55		   42.5	  11.6h	      7,930,169 docs	   196.13 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 10:43:34		   42.5	  11.7h	      7,930,169 docs	   196.13 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 18:07:32		   40.4	  19.1h	      8,028,482 docs	   196.69 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]
2024-04-10 18:11:41		   40.4	  19.2h	      8,028,482 docs	   196.69 GB	indexing-0,indexing-1	KEFbgZcMQUyb1K4wZGs4Ew:27295972 maxSegments[1]

Thank you for all the details @Yolga.Ai !

As for why force-merging takes longer for somethings but not for others, there are a couple of possible reasons.

Vector indexing

To index vectors, they must be searched. During the HNSW graph building, the current graph is searched with the configure ef_construction and of those candidates (256 in your configuration), a diversified set of 48 connections (or 96 on the bottom layer of the graph) are added.

The up-shot of this, if the HNSW graph & vectors cannot fit in off-heap memory, search time will significantly increase.

Vector comparison overheads

I am curious around a couple of your settings and your current ES version. Version 8.6 is missing many optimizations that exist in later versions.

  1. SIMD operations for faster float32 vector comparisons. This was introduced in 8.9, and makes vector search and indexing much faster
  2. int8_hnsw index type for quantization of float32 vectors. This was introduced in 8.12. Reduces the memory require by 1/4 and improves indexing speed, merging speed, and query speed
  3. Parallel multi-segment searching. We have improved multi-segment search through allow parallel search via CPUs (8.10) and have reduced the number of vector comparisons required overall (8.13). The combined effect is much faster search without requiring a force-merge.

I also noticed you are using cosine as your similarity metric. Normalizing your vectors and using dot_product will further reduce vector comparison costs, though not as significant as the above mentioned improvements.

Merge queueing

Right now force-merging for a single merge action is a single thread. However, each force-merge thread comes from the same thread-pool. Meaning force-merges can end up queuing.

See: Thread pools | Elasticsearch Guide [8.13] | Elastic

force_merge limits are:

max(1, (# of allocated processors) / 8)

This is separate from the actual compute work that is being done.

Compute work

During a force-merge, the HNSW graph is being rebuilt according to the documents from the other segments. To index vectors, vectors are also being searched. Meaning we will page in vector values to fit into memory, search them for the candidate neighbors to create the graph. This amount of work is similar to the amount of work required during regular indexing.

Recommendations

  1. Upgrade Elasticsearch
  2. Don't force-merge while also indexing vectors.
  3. Ensure vectors can fit in off-heap memory.
  4. Force-merge to a fewer number of segments, not necessarily a single segment, only after the initial indexing of vectors is completely.