Hello everyone !
I'm getting a mysterious case on the CI.
ES freezes on an index deletion or creation for a long time, when I run application tests using ES 6.8.13 (same for 7.10.1) and the application is configured in such a way, that ES produces a big amount of deprecation warnings. It uses
-
{_all: {enabled: false}}
index level setting, which causes warnings like[_all] is deprecated in 6.0+ and will be removed in 7.0. As a replacement, you can use [copy_to] on mapping fields to create your own catch all field.
- and it uses big difference between
min_gram
&max_gram
for thenGram
tokenizer, without settingmax_ngram_diff: 100
index level setting, which leads toDeprecated big difference between max_gram and min_gram in NGram Tokenizer,expected difference must be less than or equal to: [1]
If I remove the _all
setting and add the max_ngram_diff
setting then everything works fine.
If I suppress logging by setting logger.level=error
then everything works fine.
But if there are deprecation warnings, then after some time, around 30+ minutes, ES cluster (this is CI, so single node cluster) starts giving timeouts on index create or index delete requests.
There are several lines like
2021-02-25T10:30:52,191][WARN ][r.suppressed ] [R-M_ESE] path: /cucumber_company_representatives, params: {include_type_name=true, index=cucumber_company_representatives}
org.elasticsearch.cluster.metadata.ProcessClusterEventTimeoutException: failed to process cluster event (create-index [cucumber_company_representatives], cause [api]) within 30s
or
[2021-02-25T10:26:19,606][WARN ][r.suppressed ] [qcsQvvf] path: /cucumber_%2A, params: {index=cucumber_*}
org.elasticsearch.cluster.metadata.ProcessClusterEventTimeoutException: failed to process cluster event (delete-index [[cucumber_chewy_journal/OYkk5h4oT0uVmChTTyCGZg], [cucumber_chewy_specifications/J05Z0mNLQrikCJ6CXmjJmg], [cucumber_tasks/mpOSS_YAQbq6hs0Tn991vw]]) within 30s
in the log and then finally
[2021-02-25T10:31:34,083][WARN ][o.e.c.s.ClusterApplierService] [R-M_ESE] cluster state applier task [apply cluster state (from master [master {R-M_ESE}{R-M_ESEeTUy6GeUWzvaOxg}{0xVXi9TvRU6mW0I7yXDKqg}{172.18.0.4}{172.18.0.4:9300}{ml.machine_memory=31616786432, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true} committed version [3085] source [create-index [cucumber_company_representatives], cause [api]]])] took [8.7m] above the warn threshold of 30s
or
2021-02-25T10:39:27,959][WARN ][o.e.c.s.MasterService ] [qcsQvvf] cluster state update task [shard-started StartedShardEntry{shardId [[cucumber_tasks][0]], allocationId [aXTu63OnRqexz5RnxMwaWA], primary term [1], message [after new shard recovery]}[StartedShardEntry{shardId [[cucumber_tasks][0]], allocationId [aXTu63OnRqexz5RnxMwaWA], primary term [1], message [after new shard recovery]}]] took [15.6m] above the warn threshold of 30s
8 or 15 minutes are far from maximum, I've seen 50+minutes.
When this is happening, cluster health is reported to be "green". Pending tasks consist of the only task to create/delete index, e.g. {"tasks"=>[{"insert_order"=>3095, "priority"=>"URGENT", "source"=>"create-index [cucumber_company_representatives], cause [api]", "executing"=>true, "time_in_queue_millis"=>511179, "time_in_queue"=>"8.5m"}]}
. Nodes stats reports that memory usage, heap usage, disk usage are ok, there are no GC pauses and no any clues what's going on.
Please help to debug this mysterious case and find the root cause of the freezes.