How to improve bulk indexing of huge amount of docs

ROHANIL_RAJE · February 15, 2018, 2:39pm

Currently, I am using Elasticsearch 5.6 as a data store for my go application. I have a million documents spread in 12 indices. Each index has 5 shards and 2 replicas and they all are running on 4 nodes. First, I load those docs from Elasticsearch and then process them and then index them in bulk with 10000 docs/batch rate. I run 3 workers which have one async goroutine per worker at a time. Those goroutines are per index so they send bulk index requests per index. That means a worker sends around 100,000 docs in a goroutine. The docs are sent in batches, each goroutine sends almost 10 batches. This entire stuff takes more than a minute. Most of the time is taken in bulk indexing.

My current Elasticsearch is running with 6GB RAM and 3.5GB heap size. I tried to tune Elasticsearch to improve indexing speed by increasing index buffer size to 20% ie. 700MB. I disabled indexing for fields which don't need indexing. I optimised numeric fields types in mappings. I disabled _all field. I changed index codec (compression method) to best_compression. After doing all this, there is not much improvement.

So I would like to get ideas to improve bulk indexing performance to finish all process within a minute. Will it be improved if I add more RAM and heap size to Elasticsearch? Any other settings/tuning?

system · March 15, 2018, 2:39pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to increase indexing speed? Elasticsearch	5	5310	April 18, 2017
Elasticsearch bulk size/performance Elasticsearch	2	19118	July 5, 2017
Performance issue while indexing lot of documents Elasticsearch	6	1130	July 6, 2017
Indexing Speed Degrade With the Time Elasticsearch	1	463	August 29, 2017
Very large number of fields in Index leading to slow index rate Elasticsearch	11	7122	June 15, 2017

How to improve bulk indexing of huge amount of docs

Related topics