Elasticsearch bulk size/performance

abhijith_reddy · September 24, 2016, 9:24am

I am trying to load test my elasticsearch instance to figure out the optimal bulk size. Below is my setup

1 elasticsearch node running the latest (2.4)
32 GB heap size
1 index, 1 shard, 0 replicas
refresh interval = -1
indices.memory.index_buffer_size: 30
index.translog.flush_threshold_size:10000
Mapping is around ~ 20 fields, all not analyzed, stored with lowercase mapping.

I tested with 30 parallel workers doing bulk indexing batch sizes of 100, 250, 500, 1000 and each document is roughly around 250 bytes I found that I get the same performance for all the batch sizes, it just takes proportionately longer to index. I get around at around 60k inserts/sec. CPU however increases from ~ 30% to 60% (across all cores).

Is this expected ? The documentation suggests to start testing at 5 MB or but when I try that it elasticsearch just takes way too long to respond.
what batch should I choose in this case ? I am guessing the lower once as the call returns quickly and consumes less memory.
Are there any other settings that I can tweak to get more performance ?
I understand performance varies depending on the setup but is 60K/sec reasonable for this setup ? It more than suffices for our use case but I am trying to get a good benchmark.

Christian_Dahlqvist · September 24, 2016, 10:40am

As far as I recall the documentation recommends a maximum bulk size of around 5MB, not to start at that point. A common methodology to determine the optimal bulk size is to start small and increase while throughput keeps improving. In benchmarks I have performed I am usually able to saturate a node with considerably fewer parallel indexing threads, so unless 30 is a requirement, you may want to benchmark with fewer indexing threads as well.

Yes, that seems like a good choice.

Elasticsearch 2.x does a lot of optimisations behind the scenes, so there are fewer parameters that needs tuning compared to earlier versions. In the benchmarks I did for my talk at Elastic{ON} I tested with varying number of shards, which can make a difference.

That seems to be a good number, but this always depends a lot on the number of CPU, disk performance, type and size of data as well as mappings used. Sustaining max indexing rate does however leave very little resources for querying. I therefore always recommend benchmarking with a combined realistic indexing and query load to find the practical limit for a cluster.

Topic		Replies	Views
Understanding Index Buffer Size and Its Effects Elasticsearch	2	417	July 6, 2017
Bad performance with varying bulk size Elasticsearch	8	1577	July 5, 2017
Tuning indices.memory.max_index_buffer_size for indexing throughput Elasticsearch	1	2620	July 5, 2017
How does batch size effect performance in bulk indexing? Elasticsearch	4	4338	July 5, 2017
Questions --- Regarding to Size of Bulk Load., Capasity of a Shards and Performance Elasticsearch	3	367	July 6, 2017

Elasticsearch bulk size/performance

Related topics