Kibana crashes server due to very high disk reads

Short description:

I have a problem with Kibana. After a while the server experiences very high disk reads ( > 100GB). This causes the server to crash or get unresponsive.

Question:

Are you familiar with this issue? And, do you have any idea how to solve this?

Environment:

It runs on a server with 1 core, 3.5 GB memory.
Each component of the stack runs in a separate docker container.

Logs of elasicsearch:

[2017-01-26 22:15:46,375][DEBUG][indices.memory ] [Graydon Creed] recalculating shard indexing buffer, total is [101.5mb] with [3] active shards, each shard set to indexing=[33.8mb], translog=[64kb]
[2017-01-26 22:15:59,923][WARN ][monitor.jvm ] [Graydon Creed] [gc][young][98965][577] duration [4.2s], collections [1]/[3.5s], total [4.2s]/[38.3s], memory [91.2mb]->[102.3mb]/[1015.6mb], all_pools {[young] [56.1mb]->[852.3kb]/[66.5mb]}{[survivor] [410kb]->[1mb]/[8.3mb]}{[old] [35.4mb]->[35.4mb]/[940.8mb]}
[2017-01-26 22:16:12,875][DEBUG][index.engine ] [Graydon Creed] [logstash-2017.01.26][0] merge segment [_3c] done: took [29.2s], [2.5 MB], [3,936 docs], [0s stopped], [0s throttled], [0.0 MB written], [Infinity MB/sec throttle]
[2017-01-26 22:16:17,824][DEBUG][indices.memory ] [Graydon Creed] recalculating shard indexing buffer, total is [101.5mb] with [3] active shards, each shard set to indexing=[33.8mb], translog=[64kb]
[2017-01-26 22:16:52,024][DEBUG][indices.memory ] [Graydon Creed] recalculating shard indexing buffer, total is [101.5mb] with [3] active shards, each shard set to indexing=[33.8mb], translog=[64kb]
[2017-01-26 22:17:26,624][DEBUG][indices.memory ] [Graydon Creed] recalculating shard indexing buffer, total is [101.5mb] with [3] active shards, each shard set to indexing=[33.8mb], translog=[64kb]
[2017-01-26 22:17:59,824][DEBUG][indices.memory ] [Graydon Creed] recalculating shard indexing buffer, total is [101.5mb] with [3] active shards, each shard set to indexing=[33.8mb], translog=[64kb]
[2017-01-26 22:18:31,175][DEBUG][indices.memory ] [Graydon Creed] recalculating shard indexing buffer, total is [101.5mb] with [3] active shards, each shard set to indexing=[33.8mb], translog=[64kb]
[2017-01-26 22:19:04,873][DEBUG][indices.memory ] [Graydon Creed] recalculating shard indexing buffer, total is [101.5mb] with [3] active shards, each shard set to indexing=[33.8mb], translog=[64kb]
[2017-01-26 22:19:45,624][DEBUG][indices.memory ] [Graydon Creed] recalculating shard indexing buffer, total is [101.5mb] with [4] active shards, each shard set to indexing=[25.3mb], translog=[64kb]
[2017-01-26 22:19:47,724][DEBUG][index.shard ] [Graydon Creed] [logstash-2017.01.26][0] updating index_buffer_size from [33.8mb] to [25.3mb]; IndexWriter now using [0] bytes
[2017-01-26 22:19:48,126][DEBUG][index.shard ] [Graydon Creed] [logstash-2017.01.26][1] updating index_buffer_size from [33.8mb] to [25.3mb]; IndexWriter now using [0] bytes
[2017-01-26 22:19:48,223][DEBUG][index.shard ] [Graydon Creed] [logstash-2017.01.26][3] updating index_buffer_size from [500kb] to [25.3mb]; IndexWriter now using [0] bytes
[2017-01-26 22:19:48,773][DEBUG][index.shard ] [Graydon Creed] [logstash-2017.01.26][4] updating index_buffer_size from [33.8mb] to [25.3mb]; IndexWriter now using [219104] bytes
[2017-01-26 22:20:04,873][DEBUG][indices.memory ] [Graydon Creed] recalculating shard indexing buffer, total is [101.5mb] with [4] active shards, each shard set to indexing=[25.3mb], translog=[64kb]
[2017-01-26 22:20:37,024][DEBUG][index.shard ] [Graydon Creed] [logstash-2017.01.26][1] updating index_buffer_size from [25.3mb] to [500kb]; IndexWriter now using [0] bytes

Logs of Kibana:

{"type":"log","@timestamp":"2017-01-26T21:58:49Z","tags":["status","plugin:elasticsearch@1.0.0","info"],"pid":12,"state":"green","message":"Status changed from red to green - Kibana index ready","prevState":"red","prevMsg":"Request Timeout after 3000ms"}
{"type":"log","@timestamp":"2017-01-26T22:02:33Z","tags":["error","elasticsearch"],"pid":12,"message":"Request error, retrying -- socket hang up"}
{"type":"log","@timestamp":"2017-01-26T22:05:45Z","tags":["warning","elasticsearch"],"pid":12,"message":"Unable to revive connection: XXXXX
{"type":"log","@timestamp":"2017-01-26T22:06:03Z","tags":["status","plugin:elasticsearch@1.0.0","error"],"pid":12,"state":"red","message":"Status changed from green to red - Request Timeout after 300000ms","prevState":"green","prevMsg":"Kibana index ready"}
{"type":"log","@timestamp":"2017-01-26T22:07:20Z","tags":["status","plugin:elasticsearch@1.0.0","error"],"pid":12,"state":"red","message":"Status changed from red to red - Request Timeout after 3000ms","prevState":"red","prevMsg":"Request Timeout after 300000ms"}
{"type":"log","@timestamp":"2017-01-26T22:07:42Z","tags":["warning","elasticsearch"],"pid":12,"message":"Unable to revive connection: XXXXX
from red to green - Kibana index ready","prevState":"red","prevMsg":"Request Timeout after 3000ms"}
{"type":"log","@timestamp":"2017-01-26T22:15:34Z","tags":["error","elasticsearch"],"pid":12,"message":"Request error, retrying -- socket hang up"}
{"type":"log","@timestamp":"2017-01-26T22:19:02Z","tags":["warning","elasticsearch"],"pid":12,"message":"Unable to revive connection: XXXXX
{"type":"log","@timestamp":"2017-01-26T22:19:09Z","tags":["warning","elasticsearch"],"pid":12,"message":"No living connections"}

Hi @Lex,

thank your for posting the logs. First, I would like to make sure it is actually Kibana, that causes the high I/O load of the Elasticsearch container.

  • Is this problem reproducible by performing some specific operation in Kibana?
  • Does this also happen when you stop the Kibana server?

Hi Felix,

Thanks for your reply.

No it actually happened when nobody was performing actions in Kibana itself. So, your concern whether it is actually caused by Kibana is right. To answer your question, no it cannot be reproduced by some specific operation in Kibana. Could it also be a process in ElasticSearch that is causing this issue?

I haven't tried to stop the Kibana container (server). I can do this at the end of the day.

Okay, that should give us an idea of where to focus the investigation.

It is a very small server so it could be an issue with resources. How much heap do you have configured? How many indices and shards? What is the total data volume? What is the output of the cat nodes API?

Hi Christian,

I’ve upgraded the machine and this appears to solve the issue.

For now, I have no further questions.

Thanks for your help.

Kind regards,

Technical Support / Data Analyst | PublicSonar http://www.publicsonar.com/
+31 6 4400 7097 | lex@publicsonar.com | LinkedIn https://www.linkedin.com/in/lexrijvers

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.