As you can see, indexing rate is smooth more or less, but the thread queue size has spikes.
And based on the difference between Max and Mean values of queue size also, it is clear that queue size is varying highly.
I can see that these spikes are single data points (If I zoom in), It seems that queue size is zero most of the times, but suddenly spikes up for a small amount of time and goes back to zero.
I don't see any memory or cpu spikes,Disk IOPS spikes in kibana dashboards.
What are the possible reasons that this might be happening?
How do I know which metric is the bottle neck?
Is there a way to confirm if Disk IOPS is/isn't a bottle neck.
This ES isn't part of an ELK, so there isn't any bulk indexing happening.
My Cluster Configuration is:
3 m3.xlarge, 8GB Mem locked.
ES version 1.5.2
5 Primay shards and 1 replica shard
Number of threads for indexing is 4 (4 cores). I have increased the queue size limit to 400.
I have over 1.5 mil docs across 4 indices, 1 GB size. There was no circuit breaker exception or anything.
I am not sure if the issue is with the number of indexing requests, as the queue size has spikes where as indexing request graph doesn't have those sudden spikes
well, you could start by increase index thread, but always keep an eye on the impact of the node and cluster. metrics such as system load, disk i/o are important to watch out. you were saying m3.xlarge? http://aws.amazon.com/ec2/instance-types/ looks like it is virtual cpu, then you wanna look out for steal metric in top output. save your current settings before you change and revert back if the new settings would cause problem. we are using physical box , we can reach max index 140 per elasticsearch node on the good ol' es 0.90.7 with indices total size more than 2TB . i think with your hardware setup, you could go higher. but that's just my empirical experience, you should always consult the es expert for fine tuning.
I believe the reason for no spikes in indexing rate is becasue,
ES doesn't give any such info as No of Indexing reqs at any given point of time, it gives the total no of indexing requests complete till now. So marvel must ne calculating the the indexing rate like this d(Total Indexing requests/d(Time).
Depending on the time interval, Indexing rate will mostly be smooth.
I had enabled slow_logs, and found out about the requests that are slow. We had a bug in our application which was making a burst of indexing requests, this was the reason for index rejections.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.