Slow Indexing rate

Ramky · August 11, 2015, 4:41pm

I am trying to index json events with 16 string fields each and maximum EPS that i am able to index is 5k. These json events are fed to elasticsearch by traversing list of strings which are loaded from a file . Elastic Search cluster consists of 3 nodes with 32 GB RAM and 8 core CPU.

We are performing bulk indexing with following settings

index.number_of_shards=5
index.number_of_replicas=1
index.translog.flush_threshold_ops=50000
index.refresh_interval=30
indices.memory.index_buffer_size=50%
indices.fielddata.cache.size=20%
indices.fielddata.cache.expire=1h
indices.cache.filter.size=20%
indices.cache.filter.expire=1h
bulkprocessor.BulkActions=25000
bulkprocessor.BulkSize=15
bulkprocessor.FlushInterval=15
bulkprocessor.ConcurrentRequests=10

Please help to increase indexing speed

Regards
Rama Krishna P

jpountz · August 11, 2015, 5:49pm

When the bulk test is running, is Elasticsearch maxing out CPU and/or I/O? If not then maybe you just need to send data from more threads?

Ramky · August 12, 2015, 10:04am

Thanks for reply.
Indexing 1 million events with single thread, EPS (events per second) is around 7k, but when tried using 4 threads EPS is around 1.7k.

Memory Consumption is 28% (nearly 9GB) CPU is idle 96%, Maximum CPU usage is 10%.
Out of 1 million events, indexing speed is around 40k EPS till 700k events, but overall processing rate is around 7k EPS.

Below is I/O taken from one machine
Before:
avg-cpu: %user %nice %system %iowait %steal %idle
0.54 0.00 0.13 1.70 0.00 97.63

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 11.45 458.74 199.38 722502 314016
dm-0 25.07 434.87 105.50 684914 166160
dm-1 0.20 1.64 0.00 2576 0
dm-2 9.32 18.01 72.04 28370 113456

During indexing:
avg-cpu: %user %nice %system %iowait %steal %idle
1.27 0.00 0.14 3.81 0.00 94.78

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 10.81 358.03 1294.19 722886 2613096
dm-0 19.67 339.36 83.17 685194 167928
dm-1 0.16 1.28 0.00 2576 0
dm-2 149.50 14.09 1193.99 28458 2410768

jpountz · August 12, 2015, 4:45pm

So your system looks essentially idle. There must be an issue with the benchmark somehow. Maybe the bottleneck is to read data from your file?

Also for the record, setting time expiry on the fielddata and filter caches is almost always wrong as fielddata entries are very expensive to regenerate and filter entries can't go outdated as they are cached per segment.

Ramky · August 13, 2015, 8:29am

Thanks for reply.

The file is loaded only once and maintained in-memory and stats are collected after loading the file. So, i don't think reading data from file is bottleneck.

Even though disabled the expiry settings on field data and filter cache, there isn't any change in indexing rate.

jpountz · August 13, 2015, 10:32am

Then I'm very confused why you can't max out either I/O or CPU on your server. There must be something wrong somewhere... Are you using bulk or separate indexing requests to load data? Do you have good network connectivity between your server and the machine that runs the benchmark?

Right I didn't expect it to change indexing rate, this was more a side note.

Ramky · August 13, 2015, 11:38am

I created one transport client object for 3 node cluster and send JSON events from java util list loaded from file to cluster using bulk index API.

Following are parameters used for bulk indexing API :
bulkprocessor.BulkActions=10000
bulkprocessor.BulkSize=10
bulkprocessor.FlushInterval=20
bulkprocessor.ConcurrentRequests=10

I tried multiple ways but unable to increase CPU or Memory utilization and also indexing rate.

Moreover all machines are in 1 GBPS local LAN

jpountz · August 14, 2015, 2:58pm

Sorry I don't have more ideas about what is wrong, but there certainly is something...

Ramky · August 17, 2015, 4:53am

Thanks for your help in trying to address the issue. Based on few trail and error methods able to get 50k EPS.

Chang_Oh_Heo · September 12, 2016, 4:52pm

Hi, I got a similar problem.
Could you let me know how to solve the problem?

Chen_Jian · March 8, 2017, 5:31am

Got similar issue, ES data node CPU is idle and index rate is extremely slow, any suggestions?

Christian_Dahlqvist · March 8, 2017, 1:30pm

Please open a new thread for your question and provide more details around your setup and achieved performance.

Topic		Replies	Views
Slow bulk indexing performance Elasticsearch	6	1363	December 11, 2018
Slow bulk indexing Elasticsearch	4	2080	July 5, 2017
Bulk Indexing Rate Elasticsearch	4	549	April 18, 2018
Indexing Speed Degrade With the Time Elasticsearch	1	463	August 29, 2017
Bulk indexing slow down when data amount increase Elasticsearch	6	2948	July 6, 2017

Slow Indexing rate

Related topics