I am having a cluster with 5 master nodes,12 coordinator nodes and 60 data nodes .Currently i am doing heavy indexing in this es cluster around 15 billion documents spread through the day.We have 3 index which are undergoing heavy indexing there are four rollover in a day for each indexes.Each indexes having 100 shards and the replica is set to 1.The nodes are up on a physical server having 200gb of RAM ,each nodes have around 32gb of heap and the translog durability is set to async.
The bulk indexing via bulk processor is happening very slowly with 32 clients and the batch size of the bulk processor is 7500 and bulk action is 20 and the bulk size is 25mb. All the interfaces have 10gb bandwidth.
But the bulk indexing is happening very slowly.
During indexing and searching these error are coming
We have seen that many thread that putting the data are getting blocked in the add
of the bulk processor.
cluster is not responding properly and also the bulk indexing through bulk processor
is also getting logged in the slow log.
I am getting node disconnected,Receive Timeout Transport Exception when i doing bulk indexing or any executing other queries.
And in the logs i am getting failed to execute query phase(No search context found for id),gc errors , nodes being removed and added.
Please suggest .This is a very trivial issue we are facing.
If you have 100 primary shards per index and the average shard size is 30GB you are generating 72TB (30GB * 100 primary shards * 2 (1 replica) * 3 indices * 4 rollovers) of data on disk per day. That is 2400 shards generated per day. To me this sounds a bit strange. Are you sure those numbers are accurate? This does not sound slow to me...
Total number of shards is close to 8000 in the cluster at any given instant of time.Retention is d-2 days.Each document size in the index is approximately 1.8kb. We have opted for 100 shards so that the indexing is comparatively faster.The cluster is having index heavy load.Please guide where exactly the issue might be.
Do you have any non-default settings? Can you confirm that the numbers given are accurate?
If the information is accurate I owuld recommend the following:
Create the indices with 60 primary shards and 1 replica. Set rollover to cut over at an average shard size of over 50GB. This will reduce the number of shards you are indexing into as well as the number of shards in the cluster.
As your nodes have prenty of RAM and you might be having heap pressure (check if this is the case) place 2 Elasticsearch nodes per host. This means each node will hold one shard per index on average.
If you can, make sure each bulk request only indexes into one index. This will result in more documents being indexed per shard per request. As Elasticsearch syncs the transaction log per request, this should improve efficiency.
If you have not already, install monitoring so you can see how heap usage looks like.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.