Short description:
I have a problem with Kibana. After a while the server experiences very high disk reads ( > 100GB). This causes the server to crash or get unresponsive.
Question:
Are you familiar with this issue? And, do you have any idea how to solve this?
Environment:
It runs on a server with 1 core, 3.5 GB memory.
Each component of the stack runs in a separate docker container.
Logs of elasicsearch:
[2017-01-26 22:15:46,375][DEBUG][indices.memory ] [Graydon Creed] recalculating shard indexing buffer, total is [101.5mb] with [3] active shards, each shard set to indexing=[33.8mb], translog=[64kb]
[2017-01-26 22:15:59,923][WARN ][monitor.jvm ] [Graydon Creed] [gc][young][98965][577] duration [4.2s], collections [1]/[3.5s], total [4.2s]/[38.3s], memory [91.2mb]->[102.3mb]/[1015.6mb], all_pools {[young] [56.1mb]->[852.3kb]/[66.5mb]}{[survivor] [410kb]->[1mb]/[8.3mb]}{[old] [35.4mb]->[35.4mb]/[940.8mb]}
[2017-01-26 22:16:12,875][DEBUG][index.engine ] [Graydon Creed] [logstash-2017.01.26][0] merge segment [_3c] done: took [29.2s], [2.5 MB], [3,936 docs], [0s stopped], [0s throttled], [0.0 MB written], [Infinity MB/sec throttle]
[2017-01-26 22:16:17,824][DEBUG][indices.memory ] [Graydon Creed] recalculating shard indexing buffer, total is [101.5mb] with [3] active shards, each shard set to indexing=[33.8mb], translog=[64kb]
[2017-01-26 22:16:52,024][DEBUG][indices.memory ] [Graydon Creed] recalculating shard indexing buffer, total is [101.5mb] with [3] active shards, each shard set to indexing=[33.8mb], translog=[64kb]
[2017-01-26 22:17:26,624][DEBUG][indices.memory ] [Graydon Creed] recalculating shard indexing buffer, total is [101.5mb] with [3] active shards, each shard set to indexing=[33.8mb], translog=[64kb]
[2017-01-26 22:17:59,824][DEBUG][indices.memory ] [Graydon Creed] recalculating shard indexing buffer, total is [101.5mb] with [3] active shards, each shard set to indexing=[33.8mb], translog=[64kb]
[2017-01-26 22:18:31,175][DEBUG][indices.memory ] [Graydon Creed] recalculating shard indexing buffer, total is [101.5mb] with [3] active shards, each shard set to indexing=[33.8mb], translog=[64kb]
[2017-01-26 22:19:04,873][DEBUG][indices.memory ] [Graydon Creed] recalculating shard indexing buffer, total is [101.5mb] with [3] active shards, each shard set to indexing=[33.8mb], translog=[64kb]
[2017-01-26 22:19:45,624][DEBUG][indices.memory ] [Graydon Creed] recalculating shard indexing buffer, total is [101.5mb] with [4] active shards, each shard set to indexing=[25.3mb], translog=[64kb]
[2017-01-26 22:19:47,724][DEBUG][index.shard ] [Graydon Creed] [logstash-2017.01.26][0] updating index_buffer_size from [33.8mb] to [25.3mb]; IndexWriter now using [0] bytes
[2017-01-26 22:19:48,126][DEBUG][index.shard ] [Graydon Creed] [logstash-2017.01.26][1] updating index_buffer_size from [33.8mb] to [25.3mb]; IndexWriter now using [0] bytes
[2017-01-26 22:19:48,223][DEBUG][index.shard ] [Graydon Creed] [logstash-2017.01.26][3] updating index_buffer_size from [500kb] to [25.3mb]; IndexWriter now using [0] bytes
[2017-01-26 22:19:48,773][DEBUG][index.shard ] [Graydon Creed] [logstash-2017.01.26][4] updating index_buffer_size from [33.8mb] to [25.3mb]; IndexWriter now using [219104] bytes
[2017-01-26 22:20:04,873][DEBUG][indices.memory ] [Graydon Creed] recalculating shard indexing buffer, total is [101.5mb] with [4] active shards, each shard set to indexing=[25.3mb], translog=[64kb]
[2017-01-26 22:20:37,024][DEBUG][index.shard ] [Graydon Creed] [logstash-2017.01.26][1] updating index_buffer_size from [25.3mb] to [500kb]; IndexWriter now using [0] bytes
Logs of Kibana:
{"type":"log","@timestamp":"2017-01-26T21:58:49Z","tags":["status","plugin:elasticsearch@1.0.0","info"],"pid":12,"state":"green","message":"Status changed from red to green - Kibana index ready","prevState":"red","prevMsg":"Request Timeout after 3000ms"}
{"type":"log","@timestamp":"2017-01-26T22:02:33Z","tags":["error","elasticsearch"],"pid":12,"message":"Request error, retrying -- socket hang up"}
{"type":"log","@timestamp":"2017-01-26T22:05:45Z","tags":["warning","elasticsearch"],"pid":12,"message":"Unable to revive connection: XXXXX
{"type":"log","@timestamp":"2017-01-26T22:06:03Z","tags":["status","plugin:elasticsearch@1.0.0","error"],"pid":12,"state":"red","message":"Status changed from green to red - Request Timeout after 300000ms","prevState":"green","prevMsg":"Kibana index ready"}
{"type":"log","@timestamp":"2017-01-26T22:07:20Z","tags":["status","plugin:elasticsearch@1.0.0","error"],"pid":12,"state":"red","message":"Status changed from red to red - Request Timeout after 3000ms","prevState":"red","prevMsg":"Request Timeout after 300000ms"}
{"type":"log","@timestamp":"2017-01-26T22:07:42Z","tags":["warning","elasticsearch"],"pid":12,"message":"Unable to revive connection: XXXXX
from red to green - Kibana index ready","prevState":"red","prevMsg":"Request Timeout after 3000ms"}
{"type":"log","@timestamp":"2017-01-26T22:15:34Z","tags":["error","elasticsearch"],"pid":12,"message":"Request error, retrying -- socket hang up"}
{"type":"log","@timestamp":"2017-01-26T22:19:02Z","tags":["warning","elasticsearch"],"pid":12,"message":"Unable to revive connection: XXXXX
{"type":"log","@timestamp":"2017-01-26T22:19:09Z","tags":["warning","elasticsearch"],"pid":12,"message":"No living connections"}