We have an ES cluster of 7 nodes in our development environment. 3 of which are Master nodes and the rest 4 are Data nodes. Masters run with default heap and Data nodes with 4 GB heap on 8 GB RAM machines.
We do bulk inserts continuously and simultaneously fire aggregation queries every minute. We maintain day-wise indices with 5 shards and 1 replica. Our bulk count is 1500 docs and bulk size is 370 KB approx. So our everyday's index size amounts to 35 GB approx.
We are observing insertions slowing down due to increase in indexing time, which reaches to about 6-7 sec. This is observed after 12-13 hours of insertions into an index. The behaviour repeats for every index.
Also, we observe Merges as high as 16 GB on some data nodes during the same time. In the Index stats the merges are seen to around 32 GB. This is seen to affect the overall performance of ES.
We have tried various merge level settings like increasing segments_per_tier(to 15), reducing index.store.throttle.max_bytes_per_sec to something like 10 MB and reducing merge.policy.max_merged_segment to 2GB. These have managed to reduce big merges, but indexing time increase is still observed.
Please guide us on ways to have consistent indexing time and insertion rate, and how to minimize the effects of Merging.
You can try to lower the number of max_merge_at_once. Not sure if
throttling the store would be effective. Wouldn't that cause a bottleneck
and exhaust the bulk threads?
We have an ES cluster of 7 nodes in our development environment. 3 of which
are Master nodes and the rest 4 are Data nodes. Masters run with default
heap and Data nodes with 4 GB heap on 8 GB RAM machines.
We do bulk inserts continuously and simultaneously fire aggregation queries
every minute. We maintain day-wise indices with 5 shards and 1 replica. Our
bulk count is 1500 docs and bulk size is 370 KB approx. So our everyday's
index size amounts to 35 GB approx.
We are observing insertions slowing down due to increase in indexing time,
which reaches to about 6-7 sec. This is observed after 12-13 hours of
insertions into an index. The behaviour repeats for every index.
Also, we observe Merges as high as 16 GB on some data nodes during the same
time. In the Index stats the merges are seen to around 32 GB. This is seen
to affect the overall performance of ES.
We have tried various merge level settings like increasing
segments_per_tier(to 15), reducing index.store.throttle.max_bytes_per_sec
to
something like 10 MB and reducing merge.policy.max_merged_segment to 2GB.
These have managed to reduce big merges, but indexing time increase is
still
observed.
Please guide us on ways to have consistent indexing time and insertion
rate,
and how to minimize the effects of Merging.
Hi Mihir,
i had the same problem, index time increasing from about 3 sec for a bulk of 100k, to over 500sec. After increasing the number of shards from 1 to 4 per node, and setting indices.memory.index_buffer_size to 20%, the indextime is quite constant around 3-5 sec.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.