High CPU, disk load only on one node in cluster

Hello there. Setup that i have:

Cluster of 3 nodes with ES 5.3.0.
All nodes are VMs in KVM hypervisor, with no more highloaded VMs on local hypervisors.

VMs:
Ubuntu 16.04 linux-4.4.0 kernel (different minor builds), JVM: OpenJDK 64-Bit 1.8.0_131+ (different minor builds).
16 vCPU, 32Gb RAM, 0 swap, 1Tb disk space.

Hardware:
Per hypervisor 2 Intel Xeon E5 v3, Samsung SSDs, other doesn't matter AFAIK.

Cluster setup:
3 nodes, JVM heap 16Gb, x-pack installed, JMX enabled, ES config:

cluster.name: elk
discovery.zen.minimum_master_nodes: 2
discovery.zen.ping.unicast.hosts:
  - somenodes
network.bind_host: "0.0.0.0"
network.host: "0.0.0.0"
node.name: es[1,2,3]-elk
path.data: /usr/share/elasticsearch/data/elk
path.logs: /var/log/elasticsearch/elk
transport.tcp.port: 9300
xpack.security.enabled: false
indices.memory.index_buffer_size: "15%"

All index templates has settings:

"index" : {
  "number_of_shards" : 3,
  "refresh_interval" : "5s"
}

I have 426 shards, but highload persisted with <150 shards.
I have all indices of daily, 3 primary, 1 replica shards options:

1 type - 20m docs, 50Gb each with replica size, last 10 days
2 type - 70m docs, 180Gb each with replica size, last 4 days
others - 200..200k docs, < 600Mb each (these are x-pack too)

Gist with cluster info:

The issue: 3rd node has loadavg 20-25 compared to others 1-5 loadavg.

Thing that i've found weird:

  1. A lot of merges on overloaded node with high time spent

  2. GC on loaded node spends a lot of time to itself. It's also seen in stats. Log:
    [2017-11-23T17:31:05,680][WARN ][o.e.m.j.JvmGcMonitorService] [es3-elk] [gc][young][17563][4993] duration [1.4s], collections [1]/[2.1s], total [1.4s]/[2h], memory [4.6gb]->[3.9gb]/[15.6gb], all_pools {[young] [772.7mb]->[496.6kb]/[865.3mb]}{[survivor] [100.1mb]->[108.1mb]/[108.1mb]}{[old] [3.8gb]->[3.8gb]/[14.6gb]}

  3. There's way more network traffic and IOPS on overloaded node. So probably this is not hardware issue.

es1:
    DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND   
    0.00 B     9.21 G      0.00 % 33.99 % [jbd2/sda1-8]
    279.98 G   1923.02 G   0.00 %  0.02 % java
es2:
    DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND   
    0.00 B     10.31 G     0.00 % 32.72 % [jbd2/sda1-8]
    247.90 G   1933.95 G   0.00 %  0.02 % java
es3:
    DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND   
    0.00 B      305.05 G     0.00 % 39.17 % [jbd2/sda1-8]
    385.80 G      2.61 T   0.00 %  0.10 % java

It looks like most of the data routed to 3rd, overloaded node, but documents count in every shard is pretty the same.

Questions: Does anyone know why one node is overloaded? What can cause a lot of load except of just documents indexing maybe?

p.s.: i've found advice to use Oracle Java, but still did not tried it yet.

How are the shards distributed? Can you please post the output of

  1. GET _cat/shards?v
  2. GET _cat/segments?v
  3. GET _nodes/hot_threads

Regarding the JVM. Oracle is the recommended, but I don't think it's the culprit of your performance issue.

I have put it in my gist https://gist.github.com/vanch/ff74bd2fb47b16e6df6c3d090fb177c5
I've used hot_threads?threads=100 BTW.

Note that I've tried to remove replica shards for actively written indices and nothing has changed after that.

Also, if this matters: sometimes load on 3rd node getting lower and higher on others. These times cluster became more performant and there's seen index rate increase as i remember twicely.

Thanks in advance.

1 Like

How is your ingestion pipeline? Are documents being constantly updated?

Can you please be more concrete and assume that I know a few about ingestion pipelines?
We're not updating documents, only indexing them and querying searches.

Thanks in advance.

My question was how you are ingesting data into Elasticsearch. I can see that there are many hot threads (that uses lots of CPU) that are updating documents. Updating documents can put a high pressure on CPU and, since documents lives on specific shards that lives in specific nodes, then this might explain why you are getting high CPU in certain (or single) node.

You should review how you are ingesting data and identify when if could be sending documents with same id more then one time.

1 Like

That's very interesting thought. We have document-services flow like:

doc-sender -> logstash -> redis -> logstash -> es

Every logger service has 3 nodes.

doc-senders of different roles sends documents to only one separate redis service, so i assume, documents should not repeat within ES cluster.

Not sure about document updates within ES cluster, but this should not be used AFAIK.

Also, if you're right, whole cluster should probably be overloaded, as documents should be sending to different shards, but they're hitting only these, which are on 3rd node.

I don't preclude the possibility of other performance issues, but the document update threads observed are certainly contributing. It's mainly happening on the webloges1-elk and webloges2-elk indices and my recommendation is that you investigate that. Also, these indices suggests that they have time-based data while not using a time-based naming, this is not recommended and you should be using time-based indices.

Looks like i've confused you. These are nodes, but not indices. I am using daily-rotated indices eveywhere except of .kibana one.

P.S.: renamed all nodes to esN format.

Hey, no it was my mistake. I was on mobile and I misread the logs, I am sorry. You can ignore that last statement completely.

Still, the update statement is valid.

Well, if I got it right, this query should answer the question about duplicates:

curl 'http://localhost:9200/logstash*/_search?pretty' -H "Content-Type: application/json" -d '
{
    "size": 0,
    "aggs": {
        "duplicateCount": {
            "terms": {
                "field": "id",
                "min_doc_count": 2
            },
            "aggs": {
                "duplicateDocuments": {
                    "top_hits": {
                    }
                }
            }
        }
    }
}'
{
  "took" : 984,
  "timed_out" : false,
  "_shards" : {
    "total" : 129,
    "successful" : 129,
    "failed" : 0
  },
  "hits" : {
    "total" : 809022946,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "duplicateCount" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [ ]
}}}

But it seems that we have no duplicates there within most of documents in the cluster.

What else can cause updates?

That aggregation will not find any duplicate document if the field id is also used as document's id.

In fact, no aggregation can tell you that since the only way to find out if a document was updated or not is by checking it's version but you can not query on that metadata field.

I could tell you that documents are being update from the stack traces in the hot threads output. That's the only evidence that there is (but you don't need more then that, TBH)

Well, i've tried to make a query:
NOT @version:1
and it gave me null result, while
@version:1
gave me all documents in timerange.

Is it possible to update documents without version change, don't you know?
Still don't understand how to catch the update situations :frowning:

The @version field is just a common field generated by Logstash. It is completely different from the document version metadata managed by Elasticsearch that I am referring. You can not query nor aggregate on this version metadata.

The problem is not the version change itself, but the document update operation. One way that this can happen is if you are generating the document IDs externally, this can open possibilities for a document with same ID to be sent multiple times to Elasticsearch.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.