hi,all:
I have a question that has puzzled me for a long time。
One node in the cluster has too much index writer memory (other node is fline), and it keeps going up,and it lead to index throttling .
And then I adjusted index.refresh.interval from 30s to 10s , but
the situation has not improved much。
I also found this issued node's refresh queue has a large back-up,
why index writer cost so much memory?I found nothing in log
As index writer memory continues to rise,in the end will lead to the exccption: “org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, data for [<transport_request>] would be [33137445694/30.8gb], which is larger than the limit of [31621696716/29.4gb]”
elasticsearch version :7.4.0
jvm heap:31G
could you help me analyze this problem ?@DavidTurner
Are you triggering refreshes manually at a high rate somehow? (it certainly looks like it).
Maybe those refreshes only trigger shards residing on es-data-8 causing it to become overloaded and unable to flush its index writers.
Thank you for your reply,
I don't refresh index manually,I use logstash transport data from kafka to elasticsearch.
and I use logstash double writer data to es6.3 cluster and es7.4 cluster, es6.3 cluster never happend this problem。
Because of es-data-8 node's memory reached 95% of heap size(lead to CircuitBreakingException ),so task API can't return correct result .(Execption:[circuit_breaking_exception] [parent] Data too large, data for [<http_request>] would be [32273530784/30gb], which is larger than the limit of [31621696716/29.4gb])
I use task API try many time,return bellow result(i think it is incomplete,previous it can return many refresh task )
it looks like the refresh is stuck/dead-locked somehow (it's running for 16h+ already!). That seems to be the problem. Can you take a thread-dump on data8, so we can start tracking down why/where it dead-locked maybe?
it should work fine using nsenter into the Docker container. The easiest way of doing that, that I know of is https://github.com/jpetazzo/nsenter (unless you have nsenter properly working already and know how to do it :)) That should allow you to jstack just fine without permission issues.
thank you very much,I already restart es-data-8,and cluster's health is recoverd green.so thread dump can't tracking the issue. but i think this problem will happen again. if happen ,I will notice you.
Thank you once again!
hi, @nhat @Armin_Braun
I'm sorry to bother you, That's still the question that elastic search 7.4 always have some node which index writer hold too many memory.I believe this is a big issue. (es 6.3 never find this issue for a year)Help me to find what the reason behind!We must solve this problem!
The uncommitted translog should not go above 512MB per shard by default. Did you change any translog setting? Can you share the logs from the node data-09?
One theory that I have is that the throttling does not work well. Can you add -Des.index.memory.max_index_buffer_size=256mb to config/jvm.options on some nodes then restart them. Please let me know if the problem goes away on those nodes. Thank you.
Hi @nhat
Most of index ,I changed translog settings:
"translog" : {
"sync_interval" : "60s",
"durability" : "async",
"flush_threshold_size": "1gb"
}
I think I have found the problem.I checked the abnormal index carefully. Found I manually specify
the "document_id" field,but index data is problematic frequently,lead to millions of "document_id" are same ,so documents keep updating .
(as I see in the jstack file ,most of write thread is waiting “doc_id lock” : at org.elasticsearch.index.engine.LiveVersionMap.acquireLock(LiveVersionMap.java:473) at org.elasticsearch.index.engine.InternalEngine.index(InternalEngine.java:856)) .
It's the problematic index that lead to those issue:"index writer memory rise","write queue rise".
After I change the problematic index ,everything is back to normal.
Thank you very much for your help during this period.
Thanks again!
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.