Indexing perforamce, index rejections

Indices Indexing Rate Max value

Indices Indexing Rate Mean value

Thread pool queue Max

Thread pool queue min

Thread pool rejected

As you can see, indexing rate is smooth more or less, but the thread queue size has spikes.
And based on the difference between Max and Mean values of queue size also, it is clear that queue size is varying highly.

I can see that these spikes are single data points (If I zoom in), It seems that queue size is zero most of the times, but suddenly spikes up for a small amount of time and goes back to zero.

I don't see any memory or cpu spikes,Disk IOPS spikes in kibana dashboards.
What are the possible reasons that this might be happening?
How do I know which metric is the bottle neck?
Is there a way to confirm if Disk IOPS is/isn't a bottle neck.

This ES isn't part of an ELK, so there isn't any bulk indexing happening.
My Cluster Configuration is:
3 m3.xlarge, 8GB Mem locked.

Thanks.

what es version and how many thread for index? if you are using java transport client, check the setting as well. https://www.elastic.co/guide/en/elasticsearch/reference/1.x/modules-threadpool.html how many shards and what is the size per shard?

if you have this monitoring in place and ready, the next step would probably change the settings gradually. https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-update-settings.html

the cluster i admin

hth

jason

Hi Json, Thanks.

ES version 1.5.2
5 Primay shards and 1 replica shard
Number of threads for indexing is 4 (4 cores). I have increased the queue size limit to 400.
I have over 1.5 mil docs across 4 indices, 1 GB size. There was no circuit breaker exception or anything.

I am not sure if the issue is with the number of indexing requests, as the queue size has spikes where as indexing request graph doesn't have those sudden spikes

well, you could start by increase index thread, but always keep an eye on the impact of the node and cluster. metrics such as system load, disk i/o are important to watch out. you were saying m3.xlarge? http://aws.amazon.com/ec2/instance-types/ looks like it is virtual cpu, then you wanna look out for steal metric in top output. save your current settings before you change and revert back if the new settings would cause problem. we are using physical box , we can reach max index 140 per elasticsearch node on the good ol' es 0.90.7 with indices total size more than 2TB . i think with your hardware setup, you could go higher. but that's just my empirical experience, you should always consult the es expert for fine tuning.

hth

jason

I believe the reason for no spikes in indexing rate is becasue,
ES doesn't give any such info as No of Indexing reqs at any given point of time, it gives the total no of indexing requests complete till now. So marvel must ne calculating the the indexing rate like this
d(Total Indexing requests/d(Time).

Depending on the time interval, Indexing rate will mostly be smooth.

I had enabled slow_logs, and found out about the requests that are slow. We had a bug in our application which was making a burst of indexing requests, this was the reason for index rejections.

are u leveraging bulk indexing? do u take note of time taken by ES from ur caller side for each indexing request?

No, we are not doing bulk indexing atm, We will do that eventually.

No, we are not collecting the stats about time taken by ES from client side.

do both... they help