High latency queries appear every 20 minutes

ES performance reliablity issue
After I fix the issue above, I find another strange issue.

The process of our es query is:
[user application] -> [nginx load balance] -> [ coordinating nodes] -> [data nodes]

The search latency of most queries is less than 50ms. But there are still some queries which take more than 200ms in nginx logs.
If I filter the log by upstream_addr, I can find that each coordinating node has queries like this every 20 minutes. For example, the issue occurs on node A in the 13th、33rd、53rd minutes, and on node B in the 5th、25th、45th minutes.
If I replace the 8C32G coordinating node with a 16C32G server, the number of high latency queries reduce to 1/2.
I guess that the issue is related to search thread pool, because the thread pool size of the 16core ES node is larger, so the scope of one thread issue is smaller.But I can't prove it, I want to know what happened.

I use the following APIs to list all default settings.
_cluster/settings?include_defaults
_settings?include_defaults

And find two items which have the default value 20m
xpack.security.authc.token.timeout
xpack.security.authz.store.roles.index.cache.ttl

I confirm that the native realm cache causes the issue.
If I call this API _xpack/security/realm/*/_clear_cache ,the same issue orccurs immediatly.
Now I do this setting(ES 5.6.3) to reduces the number of high latency queries:

xpack.security.authc.realms:
realm1:
type: native
order: 0
cache.ttl: 24h

But I don't know how realm cache causes the issue.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.