Getting gc overhead on elastic search cluster on each instance

manish_jaiswal · September 10, 2018, 7:11am

I am getting gc overhead on elastic search cluster on each instance. (16data node 3master)

instance memory 60
heap size ES :32
queue size:7000
thread pool:write

Indices: 1,541
Primary Shards 7,557
Replica Shards 7,557

please help .because of this too many data loss we are not able to see logs on kibana.

we directly sending data directly from file beat

in filebeat error:

2018-09-04T17:39:22.326+0530 INFO elasticsearch/client.go:690 Connected to Elasticsearch version 6.3.1
2018-09-04T17:39:22.331+0530 INFO template/load.go:73 Template already exists and will not be overwritten.
2018-09-04T17:39:22.331+0530 INFO [publish] pipeline/retry.go:172 retryer: send unwait-signal to consumer
2018-09-04T17:39:22.331+0530 INFO [publish] pipeline/retry.go:174 done
2018-09-04T17:39:22.341+0530 INFO [publish] pipeline/retry.go:149 retryer: send wait signal to consumer
2018-09-04T17:39:22.341+0530 INFO [publish] pipeline/retry.go:151 done
2018-09-04T17:39:22.973+0530 INFO [monitoring] log/log.go:124 Non-zero metrics in the last 30s {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":150580,"time":{"ms":3}},"total":{"ticks":1106600,"time":{"ms":75},"value":1106600},"user":{"ticks":956020,"time":{"ms":72}}},"info":{"ephemeral_id":"c0c77725-d7ec-4d04-9778-6c3e87caf483","uptime":{"ms":271440046}},"memstats":{"gc_next":20433952,"memory_alloc":18759592,"memory_total":81088606696}},"filebeat":{"harvester":{"open_files":10,"running":10}},"libbeat":{"config":{"module":{"running":0}},"output":{"events":{"batches":29,"failed":89,"total":89},"read":{"bytes":30628},"write":{"bytes":53426}},"pipeline":{"clients":96,"events":{"active":4126,"retry":178}}},"registrar":{"states":{"current":11}},"system":{"load":{"1":0.73,"15":0.21,"5":0.27,"norm":{"1":0.0913,"15":0.0263,"5":0.0338}}},"xpack":{"monitoring":{"pipeline":{"events":{"published":3,"total":3},"queue":{"acked":3}}}}}}}
2018-09-04T17:39:23.341+0530 ERROR pipeline/output.go:92 Failed to publish events: temporary bulk send failure
2018-09-04T17:39:23.341+0530 INFO [publish] pipeline/retry.go:172 retryer: send unwait-signal to consumer
2018-09-04T17:39:23.341+0530 INFO [publish] pipeline/retry.go:174 done
2018-09-04T17:39:23.341+0530 INFO [publish] pipeline/retry.go:149 retryer: send wait signal to consumer
2018-09-04T17:39:23.341+0530 INFO [publish] pipeline/retry.go:151 done

please help

warkolm · September 10, 2018, 7:13am

What version are you on?
What OS? What JVM?

Are you sure the heap is 32GB?

How much data, in GB, do those shards represent?

manish_jaiswal · September 10, 2018, 7:42am

ES version 6.3.1
kibana version 6.3.1
filebeat 6.3.1

es os:CentOS Linux 7

java version "1.8.0_151"

data is not uniform some indices are of 1mb and some of 100gb.(least is 2.6kb to 500 gb max)

all data node have 32 gb heap.(master have less heap)

Christian_Dahlqvist · September 10, 2018, 7:50am

Having a lot of small shards can, as described in this blog post, be very inefficient. I would therefore recommend you try to reduce the shard count by changing your sharding strategy. It also seems like you may have been suffering from bulk rejections as you have dramatically increased the index queue size. This will most certainly not help with heap usage, and reducing the number of shards you are actively indexing into can help with this too.

manish_jaiswal · September 10, 2018, 10:06am

thanks for quick help,

how much queue size should i keep for write thread pool.

what should be the shard size after reducing.

Christian_Dahlqvist · September 10, 2018, 10:11am

This typically depends on your use-case. A good shard size to aim for is somewhere between 10GB and 30GB, but can sometimes be slightly lower or even higher. The size of the bulk queue depends on how you index, but increasing it as much as you have done is often just applying a band-aid instead of addressing the underlying issue.

system · October 8, 2018, 10:25am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.