Increase in Indexing Time and big Merges

Hi,

We have an ES cluster of 7 nodes in our development environment. 3 of which are Master nodes and the rest 4 are Data nodes. Masters run with default heap and Data nodes with 4 GB heap on 8 GB RAM machines.

We do bulk inserts continuously and simultaneously fire aggregation queries every minute. We maintain day-wise indices with 5 shards and 1 replica. Our bulk count is 1500 docs and bulk size is 370 KB approx. So our everyday's index size amounts to 35 GB approx.

We are observing insertions slowing down due to increase in indexing time, which reaches to about 6-7 sec. This is observed after 12-13 hours of insertions into an index. The behaviour repeats for every index.

Also, we observe Merges as high as 16 GB on some data nodes during the same time. In the Index stats the merges are seen to around 32 GB. This is seen to affect the overall performance of ES.

We have tried various merge level settings like increasing segments_per_tier(to 15), reducing index.store.throttle.max_bytes_per_sec to something like 10 MB and reducing merge.policy.max_merged_segment to 2GB. These have managed to reduce big merges, but indexing time increase is still observed.

Please guide us on ways to have consistent indexing time and insertion rate, and how to minimize the effects of Merging.

Thanks
Mihir

You can try to lower the number of max_merge_at_once. Not sure if
throttling the store would be effective. Wouldn't that cause a bottleneck
and exhaust the bulk threads?

--
Ivan

On Wed, May 14, 2014 at 10:37 PM, Mihir M mihirsm90@gmail.com wrote:

Hi,

We have an ES cluster of 7 nodes in our development environment. 3 of which
are Master nodes and the rest 4 are Data nodes. Masters run with default
heap and Data nodes with 4 GB heap on 8 GB RAM machines.

We do bulk inserts continuously and simultaneously fire aggregation queries
every minute. We maintain day-wise indices with 5 shards and 1 replica. Our
bulk count is 1500 docs and bulk size is 370 KB approx. So our everyday's
index size amounts to 35 GB approx.

We are observing insertions slowing down due to increase in indexing time,
which reaches to about 6-7 sec. This is observed after 12-13 hours of
insertions into an index. The behaviour repeats for every index.

Also, we observe Merges as high as 16 GB on some data nodes during the same
time. In the Index stats the merges are seen to around 32 GB. This is seen
to affect the overall performance of ES.

We have tried various merge level settings like increasing
segments_per_tier(to 15), reducing index.store.throttle.max_bytes_per_sec
to
something like 10 MB and reducing merge.policy.max_merged_segment to 2GB.
These have managed to reduce big merges, but indexing time increase is
still
observed.

Please guide us on ways to have consistent indexing time and insertion
rate,
and how to minimize the effects of Merging.

Thanks
Mihir


Regards

View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Increase-in-Indexing-Time-and-big-Merges-tp4055918.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/1400132247263-4055918.post%40n3.nabble.com
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBuiKrHQspviENdsd0xkCAvWp7qyewBzPy1LCy%3DCTH0MQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Hi Mihir,
i had the same problem, index time increasing from about 3 sec for a bulk of 100k, to over 500sec. After increasing the number of shards from 1 to 4 per node, and setting indices.memory.index_buffer_size to 20%, the indextime is quite constant around 3-5 sec.