High CPU, disk load only on one node in cluster

vanch · November 23, 2017, 4:40pm

Hello there. Setup that i have:

Cluster of 3 nodes with ES 5.3.0.
All nodes are VMs in KVM hypervisor, with no more highloaded VMs on local hypervisors.

VMs:
Ubuntu 16.04 linux-4.4.0 kernel (different minor builds), JVM: OpenJDK 64-Bit 1.8.0_131+ (different minor builds).
16 vCPU, 32Gb RAM, 0 swap, 1Tb disk space.

Hardware:
Per hypervisor 2 Intel Xeon E5 v3, Samsung SSDs, other doesn't matter AFAIK.

Cluster setup:
3 nodes, JVM heap 16Gb, x-pack installed, JMX enabled, ES config:

cluster.name: elk
discovery.zen.minimum_master_nodes: 2
discovery.zen.ping.unicast.hosts:
  - somenodes
network.bind_host: "0.0.0.0"
network.host: "0.0.0.0"
node.name: es[1,2,3]-elk
path.data: /usr/share/elasticsearch/data/elk
path.logs: /var/log/elasticsearch/elk
transport.tcp.port: 9300
xpack.security.enabled: false
indices.memory.index_buffer_size: "15%"

All index templates has settings:

"index" : {
  "number_of_shards" : 3,
  "refresh_interval" : "5s"
}

I have 426 shards, but highload persisted with <150 shards.
I have all indices of daily, 3 primary, 1 replica shards options:

1 type - 20m docs, 50Gb each with replica size, last 10 days
2 type - 70m docs, 180Gb each with replica size, last 4 days
others - 200..200k docs, < 600Mb each (these are x-pack too)

Gist with cluster info:

gist.github.com

https://gist.github.com/vanch/ff74bd2fb47b16e6df6c3d090fb177c5

hot_threads?threads=100

::: {es2}{w1GuR3wsR-Sxw7hNg07Gjw}{-I2vfNjGTD2xSBx5A0pTQQ}{172.27.57.197}{172.27.57.197:9300}
   Hot threads at 2017-12-04T15:23:17.346Z, interval=500ms, busiestThreads=100, ignoreIdleThreads=true:
   
   11.0% (55.1ms out of 500ms) cpu usage by thread 'elasticsearch[es2][refresh][T#2]'
     6/10 snapshots sharing following 26 elements
       org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.write(PerFieldPostingsFormat.java:140)
       org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:107)
       org.apache.lucene.index.DefaultIndexingChain.flush(DefaultIndexingChain.java:134)
       org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:443)
       org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:539)

This file has been truncated. show original

segments

index                            shard  prirep  ip   segment  generation  docs.count  docs.deleted  size      size.memory  committed  searchable  version  compound
.kibana                          0      p       es1  _4n      167         12          3             84.1kb    6795         true       true        6.4.1    false
.kibana                          0      p       es1  _4o      168         1           0             5.9kb     2845         true       true        6.4.1    true
.kibana                          0      p       es1  _4t      173         1           0             6.4kb     2781         true       true        6.4.1    true
.kibana                          0      p       es1  _4u      174         1           0             5.3kb     2782         true       true        6.4.1    true
.kibana                          0      p       es1  _57      187         1           0             27.1kb    2627         true       true        6.4.1    true
.kibana                          0      p       es1  _5c      192         1           0             50.2kb    2612         true       true        6.4.1    true
.kibana                          0      r       es2  _4n      167         12          3             84.1kb    6795         true       true        6.4.1    false
.kibana                          0      r       es2  _4o      168         1           0             5.9kb     2845         true       true        6.4.1    true
.kibana                          0      r       es2  _4t      173         1           0             6.4kb     2781         true       true        6.4.1    true

This file has been truncated. show original

shards

index                            shard  prirep  state    docs      store    node
.kibana                          0      p       STARTED  17        179.8kb  es1
.kibana                          0      r       STARTED  17        179.8kb  es2
logstash-2017.11.10              0      p       STARTED  6210331   8.4gb    es1
logstash-2017.11.10              0      r       STARTED  6210331   8.4gb    es3
logstash-2017.11.10              1      p       STARTED  6210979   8.4gb    es2
logstash-2017.11.10              1      r       STARTED  6210979   8.

There are more than three files. show original

The issue: 3rd node has loadavg 20-25 compared to others 1-5 loadavg.

Thing that i've found weird:

A lot of merges on overloaded node with high time spent
GC on loaded node spends a lot of time to itself. It's also seen in stats. Log:
[2017-11-23T17:31:05,680][WARN ][o.e.m.j.JvmGcMonitorService] [es3-elk] [gc][young][17563][4993] duration [1.4s], collections [1]/[2.1s], total [1.4s]/[2h], memory [4.6gb]->[3.9gb]/[15.6gb], all_pools {[young] [772.7mb]->[496.6kb]/[865.3mb]}{[survivor] [100.1mb]->[108.1mb]/[108.1mb]}{[old] [3.8gb]->[3.8gb]/[14.6gb]}
There's way more network traffic and IOPS on overloaded node. So probably this is not hardware issue.

es1:
    DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND   
    0.00 B     9.21 G      0.00 % 33.99 % [jbd2/sda1-8]
    279.98 G   1923.02 G   0.00 %  0.02 % java
es2:
    DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND   
    0.00 B     10.31 G     0.00 % 32.72 % [jbd2/sda1-8]
    247.90 G   1933.95 G   0.00 %  0.02 % java
es3:
    DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND   
    0.00 B      305.05 G     0.00 % 39.17 % [jbd2/sda1-8]
    385.80 G      2.61 T   0.00 %  0.10 % java

It looks like most of the data routed to 3rd, overloaded node, but documents count in every shard is pretty the same.

Questions: Does anyone know why one node is overloaded? What can cause a lot of load except of just documents indexing maybe?

p.s.: i've found advice to use Oracle Java, but still did not tried it yet.

thiago · December 3, 2017, 10:12pm

How are the shards distributed? Can you please post the output of

GET _cat/shards?v
GET _cat/segments?v
GET _nodes/hot_threads

Regarding the JVM. Oracle is the recommended, but I don't think it's the culprit of your performance issue.

vanch · December 4, 2017, 6:17am

I have put it in my gist https://gist.github.com/vanch/ff74bd2fb47b16e6df6c3d090fb177c5
I've used hot_threads?threads=100 BTW.

Note that I've tried to remove replica shards for actively written indices and nothing has changed after that.

Also, if this matters: sometimes load on 3rd node getting lower and higher on others. These times cluster became more performant and there's seen index rate increase as i remember twicely.

Thanks in advance.

thiago · December 7, 2017, 7:20pm

How is your ingestion pipeline? Are documents being constantly updated?

vanch · December 7, 2017, 7:52pm

Can you please be more concrete and assume that I know a few about ingestion pipelines?
We're not updating documents, only indexing them and querying searches.

Thanks in advance.

thiago · December 7, 2017, 7:55pm

My question was how you are ingesting data into Elasticsearch. I can see that there are many hot threads (that uses lots of CPU) that are updating documents. Updating documents can put a high pressure on CPU and, since documents lives on specific shards that lives in specific nodes, then this might explain why you are getting high CPU in certain (or single) node.

You should review how you are ingesting data and identify when if could be sending documents with same id more then one time.

vanch · December 7, 2017, 8:06pm

That's very interesting thought. We have document-services flow like:

doc-sender -> logstash -> redis -> logstash -> es

Every logger service has 3 nodes.

doc-senders of different roles sends documents to only one separate redis service, so i assume, documents should not repeat within ES cluster.

Not sure about document updates within ES cluster, but this should not be used AFAIK.

Also, if you're right, whole cluster should probably be overloaded, as documents should be sending to different shards, but they're hitting only these, which are on 3rd node.

thiago · December 7, 2017, 8:30pm

I don't preclude the possibility of other performance issues, but the document update threads observed are certainly contributing. It's mainly happening on the webloges1-elk and webloges2-elk indices and my recommendation is that you investigate that. Also, these indices suggests that they have time-based data while not using a time-based naming, this is not recommended and you should be using time-based indices.

vanch · December 7, 2017, 9:38pm

Looks like i've confused you. These are nodes, but not indices. I am using daily-rotated indices eveywhere except of .kibana one.

P.S.: renamed all nodes to esN format.

thiago · December 7, 2017, 9:45pm

Hey, no it was my mistake. I was on mobile and I misread the logs, I am sorry. You can ignore that last statement completely.

Still, the update statement is valid.

vanch · December 12, 2017, 9:18am

Well, if I got it right, this query should answer the question about duplicates:

curl 'http://localhost:9200/logstash*/_search?pretty' -H "Content-Type: application/json" -d '
{
    "size": 0,
    "aggs": {
        "duplicateCount": {
            "terms": {
                "field": "id",
                "min_doc_count": 2
            },
            "aggs": {
                "duplicateDocuments": {
                    "top_hits": {
                    }
                }
            }
        }
    }
}'

{
  "took" : 984,
  "timed_out" : false,
  "_shards" : {
    "total" : 129,
    "successful" : 129,
    "failed" : 0
  },
  "hits" : {
    "total" : 809022946,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "duplicateCount" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [ ]
}}}

But it seems that we have no duplicates there within most of documents in the cluster.

What else can cause updates?

thiago · December 12, 2017, 2:45pm

That aggregation will not find any duplicate document if the field id is also used as document's id.

In fact, no aggregation can tell you that since the only way to find out if a document was updated or not is by checking it's version but you can not query on that metadata field.

I could tell you that documents are being update from the stack traces in the hot threads output. That's the only evidence that there is (but you don't need more then that, TBH)

vanch · December 13, 2017, 2:15pm

Well, i've tried to make a query:
NOT @version:1
and it gave me null result, while
@version:1
gave me all documents in timerange.

Is it possible to update documents without version change, don't you know?
Still don't understand how to catch the update situations

thiago · December 13, 2017, 2:35pm

The @version field is just a common field generated by Logstash. It is completely different from the document version metadata managed by Elasticsearch that I am referring. You can not query nor aggregate on this version metadata.

The problem is not the version change itself, but the document update operation. One way that this can happen is if you are generating the document IDs externally, this can open possibilities for a document with same ID to be sent multiple times to Elasticsearch.

system · January 10, 2018, 2:35pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Data node high CPU Elasticsearch	19	3653	February 26, 2018
The load of ES cluster CPU is high, but the utilization rate is not high Elasticsearch	7	473	December 14, 2021
What causes high CPU load on ES-Storage Nodes? Elasticsearch	5	461	July 6, 2017
ELK on Windows: Very Poor Performance Elasticsearch	3	1070	May 17, 2017
High Cpu usage in elasticsearch nodes Elasticsearch	5	1279	February 6, 2019

High CPU, disk load only on one node in cluster

Related topics