Elasticsearch is taking too long between a BulkImport and another

andreatera · November 29, 2018, 9:31am

Hi all,

we have a microservice application that is getting data from Kafka and importing them into Elasticsearch using BulkImport operation.

The microservice application is running in docker using docker-compose and scaling for parallel and multi-thread.
Elasticsearch (2.4.1) is also running into docker using docker-compose with the following configuration (1master with 4GB javaHeapSize - 1client with 4GB javaHeapSize - 5data with 8GB javaHeapSize - 24shards - 1index~7.89GB).
The VM have 256GB of RAM, 24 CPU (24core), 500GB disk space ext4

We noted that the application is taking 20s between some BulkImport and continuing with others, at the end to import a fullIndex of 7.49GB (6,2Milions hits) is taking 4h40m.. Not what we expected.

We already tried to:

Disable refresh and replicas for initial loads
Setting ulimits higher
Setting scale configuration of threadpools

No luck.
Can we have some suggestion in order to increase indexing speed?

Christian_Dahlqvist · November 29, 2018, 9:39am

I would recommend looking at this guide. You may also want to optimise your mappings to reduce work required at indexing time.

Having a single master-eligible node makes it a single point of failure and is not recommended. Also make sure you are sending data directly to the data nodes so the client node does not become a bottleneck.

andreatera · November 29, 2018, 2:33pm

Thanks for the fast reply @Christian_Dahlqvist, this env is just a prototype in order to get some metrics useful for prod, were we have 3 clients, 3 master, 12 data nodes..
Anyway good point!
Do you know if client node is limiting number of requests/rate?

Christian_Dahlqvist · November 29, 2018, 3:00pm

Have a look at the Elasticsearch nodes and see if you can identify what is limiting throughput. Elasticsearch is often very disk I/O intensive, so slow storage is a common bottleneck, but it could also be CPU or GC.

andreatera · November 29, 2018, 3:27pm

Is not for sure CPU and GC, I already monitored it and it's ok.. Regarding disk I/O we are using volumes with default docker driver.

Christian_Dahlqvist · November 29, 2018, 3:32pm

What kind of storage do you have?

andreatera · November 29, 2018, 4:28pm

VM is running in VMWare vCloud Director with the so called "Fast Storage with Snapshot Site B2"..

Christian_Dahlqvist · November 29, 2018, 4:37pm

I have no idea what that means or corresponds to.

andreatera · November 29, 2018, 4:41pm

Sorry me neither.. I have no more detailed information about this Cloud Storage.

Christian_Dahlqvist · November 29, 2018, 4:53pm

Look at iostat on the VM while it is under load, and see if that gives any indication.

system · December 27, 2018, 4:54pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ElasticSearch Bulk indexing is not scaling Elasticsearch	7	2975	July 5, 2017
Bulk indexing slow down when data amount increase Elasticsearch	6	2995	July 6, 2017
Bulk import unbalanced Elasticsearch	12	557	May 18, 2018
Slow bulk indexing Elasticsearch	4	2102	July 5, 2017
Bulk Insert Throughput Issues Elasticsearch	2	335	July 6, 2017

Elasticsearch is taking too long between a BulkImport and another

Related topics