We have a problem with our Elastic Search cluster in every environment. The
cluster will sometimes get into an unstable state, wherein certain ES nodes
will have load that is several times greater than the load on other nodes.
This can be reproduced every time very quickly by hitting the cluster with
about 40 concurrent threads while running data indexing, and much less
quickly by simply running constant searches.
Our cluster setup:
4 no-data client nodes that service search requests
1 no-data client node that sends data indexing requests.
10 data nodes that are called by the 5 for indexing and searching.
I took a look at the ZIP, but there is only a load graph there. Are you
graphing all other ES metrics? If so, look at them and see which ones are
different. If not, see my signature. If all hardware, etc. specs are the
same, something is going to be causing that load and the load is just a
symptom. We had a client the other day whose load would go up every few
hours. The query rate was mostly continuous. So why would the load (and
query latency) go up!? It turned out the proportion of the more expensive
queries was higher during those high load/latency times. But before that
we looked at all the metrics and eliminated everything else we could
eliminate.
We have a problem with our Elastic Search cluster in every environment.
The cluster will sometimes get into an unstable state, wherein certain ES
nodes will have load that is several times greater than the load on other
nodes.
This can be reproduced every time very quickly by hitting the cluster with
about 40 concurrent threads while running data indexing, and much less
quickly by simply running constant searches.
Our cluster setup:
4 no-data client nodes that service search requests
1 no-data client node that sends data indexing requests.
10 data nodes that are called by the 5 for indexing and searching.
We have a problem with our Elastic Search cluster in every environment.
The cluster will sometimes get into an unstable state, wherein certain ES
nodes will have load that is several times greater than the load on other
nodes.
This can be reproduced every time very quickly by hitting the cluster with
about 40 concurrent threads while running data indexing, and much less
quickly by simply running constant searches.
Our cluster setup:
4 no-data client nodes that service search requests
1 no-data client node that sends data indexing requests.
10 data nodes that are called by the 5 for indexing and searching.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.