If forcemerge is run serial, why are there so many tasks?


(Barry Kaplan) #1

I have several cron jobs that run curator to forcemerge indices. I understood that forcemerge would happen serially. But it seems there are lots of active (not pending) tasks. Is this normal?

eg:

indices:admin/forcemerge[n]         SwYN-ARnQwmodMB1ApOD4g:837480483  39Sv5AWwRC6Wb82ISQZrZw:846175241  netty     1543936040290 15:07:20  1.6h         10.0.197.117 ops-elk-2
indices:admin/forcemerge            LnMkZVzARDOrWe1XiNS39g:1170743914 -                                 transport 1543937887255 15:38:07  1h           10.0.199.180 ops-elk-3
indices:admin/forcemerge            LnMkZVzARDOrWe1XiNS39g:1170743945 -                                 transport 1543937887330 15:38:07  1h           10.0.199.180 ops-elk-3
indices:admin/forcemerge            LnMkZVzARDOrWe1XiNS39g:1170776216 -                                 transport 1543937947273 15:39:07  1h           10.0.199.180 ops-elk-3
indices:admin/forcemerge            LnMkZVzARDOrWe1XiNS39g:1170776221 -                                 transport 1543937947329 15:39:07  1h           10.0.199.180 ops-elk-3
indices:admin/forcemerge            LnMkZVzARDOrWe1XiNS39g:1170809287 -                                 transport 1543938007223 15:40:07  1h           10.0.199.180 ops-elk-3
indices:admin/forcemerge            LnMkZVzARDOrWe1XiNS39g:1170809369 -                                 transport 1543938007343 15:40:07  1h           10.0.199.180 ops-elk-3
indices:admin/forcemerge            LnMkZVzARDOrWe1XiNS39g:1170844193 -                                 transport 1543938068040 15:41:08  1h           10.0.199.180 ops-elk-3
indices:admin/forcemerge            LnMkZVzARDOrWe1XiNS39g:1170844381 -                                 transport 1543938068359 15:41:08  1h           10.0.199.180 ops-elk-3
indices:admin/forcemerge            LnMkZVzARDOrWe1XiNS39g:1170878744 -                                 transport 1543938127168 15:42:07  1h           10.0.199.180 ops-elk-3
indices:admin/forcemerge            LnMkZVzARDOrWe1XiNS39g:1170878793 -                                 transport 1543938127199 15:42:07  1h           10.0.199.180 ops-elk-3
indices:admin/forcemerge            LnMkZVzARDOrWe1XiNS39g:1170911514 -                                 transport 1543938187319 15:43:07  1h           10.0.199.180 ops-elk-3
indices:admin/forcemerge            LnMkZVzARDOrWe1XiNS39g:1170912044 -                                 transport 1543938188045 15:43:08  1h           10.0.199.180 ops-elk-3
indices:admin/forcemerge            LnMkZVzARDOrWe1XiNS39g:1170946763 -                                 transport 1543938247901 15:44:07  59.4m        10.0.199.180 ops-elk-3
indices:admin/forcemerge            LnMkZVzARDOrWe1XiNS39g:1170946958 -                                 transport 1543938248255 15:44:08  59.4m        10.0.199.180 ops-elk-3
indices:admin/forcemerge            LnMkZVzARDOrWe1XiNS39g:1170981372 -                                 transport 1543938308317 15:45:08  58.4m        10.0.199.180 ops-elk-3
indices:admin/forcemerge            LnMkZVzARDOrWe1XiNS39g:1170981784 -                                 transport 1543938308765 15:45:08  58.4m        10.0.199.180 ops-elk-3
indices:admin/forcemerge[n]         39Sv5AWwRC6Wb82ISQZrZw:471941     LnMkZVzARDOrWe1XiNS39g:1170981784 netty     1543941102526 16:31:42  11.9m        10.0.196.141 ops-elk-1
indices:admin/forcemerge            LnMkZVzARDOrWe1XiNS39g:1171012507 -                                 transport 1543938367287 15:46:07  57.5m        10.0.199.180 ops-elk-3
indices:admin/forcemerge            LnMkZVzARDOrWe1XiNS39g:1171012740 -                                 transport 1543938367556 15:46:07  57.5m        10.0.199.180 ops-elk-3
indices:admin/forcemerge            LnMkZVzARDOrWe1XiNS39g:1171048042 -                                 transport 1543938427739 15:47:07  56.4m        10.0.199.180 ops-elk-3
indices:admin/forcemerge            LnMkZVzARDOrWe1XiNS39g:1171048288 -                                 transport 1543938428106 15:47:08  56.4m        10.0.199.180 ops-elk-3
indices:admin/forcemerge            LnMkZVzARDOrWe1XiNS39g:1171082616 -                                 transport 1543938488315 15:48:08  55.4m        10.0.199.180 ops-elk-3
indices:admin/forcemerge            LnMkZVzARDOrWe1XiNS39g:1171082809 -                                 transport 1543938488676 15:48:08  55.4m        10.0.199.180 ops-elk-3
indices:admin/forcemerge            LnMkZVzARDOrWe1XiNS39g:1171117347 -                                 transport 1543938548505 15:49:08  54.4m        10.0.199.180 ops-elk-3

(Aaron Mildenstein) #2

How many shards in your index? There should be one forcemerge per shard, as the target is max segments per shard.


(Barry Kaplan) #3

Ah, ok. So all shards are merged concurrently but only for a single index at a time?

The indices have 2 primaries and 1 replica (4 shards). I have a cron setup for each index type (ie, logstash-, metricbeat-). Yesterday I screwed up my cron expression and ran the jobs once minute for a while. I had 800+ forcemerge tasks. So I don't understand the number of active forcemerge tasks. Can they all be queued but still show as active rather than pending?

(btw, I had to restart the es process to clear the tasks)


(Aaron Mildenstein) #4

ForceMerge tasks are blocking. Additional requests will be held until the first is completed. This is why they queued up.


(Barry Kaplan) #5

I expected the queued tasks to be in GET _cat/pending_tasks, not GET _cat/tasks.

And if they are simply queued, why are they cancellable:false and have running times?


(Aaron Mildenstein) #6

From the documentation:

This call will block until the merge is complete. If the http connection is lost, the request will continue in the background, and any new requests will block until the previous force merge is complete.

This API call is old. As such, it doesn't follow any of the newer tasks API conventions (can't be cancelled). Hopefully a future release will address that, but I'm not sure if it even it can be. ForceMerging segments is not a process that you can just stop without potentially ruining the Lucene data structure.


(Barry Kaplan) #7

Thanks very much Aaron