Elasticsearch URL unresponsive

Hi

OS - Ubuntu 22.04
Elasticsearch - 7.17.27

we are getting continuous warnings in elasticsearch logs regarding

[WARN ][o.e.h.AbstractHttpServerTransport] [XXES12] handling request [null][POST][/index][Netty4HttpChannel{localAddress=/xx.xx.xx.xx:9200, remoteAddress=/xx.xx.xx.xx:60763}] took [817826ms] which is above the warn threshold of [5000ms]

and afte some time one of node is left from cluster automatically and again after some time it is rejoin and it is impact on elasticsearch cluster URL health.

Elasticsearch URL is not wokking on particulat time.

can you help here how toresolve this issue?

What is the output of:

GET /
GET /_cat/nodes?v
GET /_cat/health?v
GET /_cat/indices?v

If some outputs are too big, please share them on gist.github.com and link them here.

Hi @dadoonet

we can not provide this detail over here

can you please let me know how increase this warning threshold?

we applied below settings in our elasticsearch cluster settings

{"template": "*",
"settings": {
"number_of_shards": 1,
"index.max_result_window": "50000",
"number_of_replicas": 0,
"index.search.slowlog.threshold.query.warn": "10s",
"index.search.slowlog.threshold.query.info": "5s",
"index.search.slowlog.threshold.query.debug": "2s",
"index.search.slowlog.threshold.query.trace": "500ms",
"index.search.slowlog.threshold.fetch.warn": "1s",
"index.search.slowlog.threshold.fetch.info": "800ms",
"index.search.slowlog.threshold.fetch.debug": "500ms",
"index.search.slowlog.threshold.fetch.trace": "200ms",
"index.search.slowlog.level": "info",
"index.indexing.slowlog.threshold.index.warn": "10s",
"index.indexing.slowlog.threshold.index.info": "5s",
"index.indexing.slowlog.threshold.index.debug": "2s",
"index.indexing.slowlog.threshold.index.trace": "500ms",
"index.indexing.slowlog.level": "info",
"index.indexing.slowlog.source": "1000"
}
}

@Ekta,

Thanks for sharing. You can use the update settings API. However I'm not sure changing those will help you without diagnosing the cause of the potentially poor cluster health.

We really need the information from the health endpoints as @dadoonet suggested to see the health of the cluster to determine the cause of the issue. It's ok to obfuscate confidential information such as API keys to share the information generally.

If you have a particular concern feel free to DM myself and David.

You original Q was:

and now thats morphed into:

That sounds like a bad idea to me.

817826ms was the time reported for some POST call, that's close to 15 minutes, and then the node left the cluster? And you dont want to be warned about that ?

Obfuscate the data if you need to, but pls share the output requested.

"number_of_shards": 1,
"number_of_replicas": 0

You won't mind if you lose all your data if a disk/server dies?