Bulk index with java rest client

Hello,

I'm having a cluster with 2 nodes (one with 8GB and the other with 16 GB heap, both having 4 CPU cors) and i'm trying to index documents using bulk API with RestHighLevelClient.

I'm creating bulks of 30000 docs, aprox 3 mb and i'm getting "listener timeout after waiting for [30000] ms".
Also tried with bulks of 10000 but i get the same exception.
The only way it worked was to create bulks of 1000 docs, but this is not an option for us. I am in the process of migrating from ES 2.3.4 to ES 6.1.1 and the same indexing process (bulks of 30k docs) with ES 2.3.4 transport node works pretty fast.

Also i've tried indexing the bulk with ES 6.1.1 Transport Client, this worked fine but since this will be deprecated in ES 7 and removed in ES 8, i thought of using the RestHighLevelClient from now on, as much as i can.

What do you think about this?

Hi,

I have few questions:

  • Both nodes are data nodes? Did you try to query the same node with 30k, 10k and 1k as welle with the TransportClient?
  • Do you see any error in logs?
  • Are the bulk items only create operations, or do you have updates and deletes too?
  • Do you use dynamic mappings or are the mappings already created (with all possible fields) before executing the bulks?

Also curious about how long that takes for you on average for one of your 30k bulks.

Hello,

Both nodes are data nodes, queries work, queries works fine, only bulk index request are slow (didn't tried other types of bulk request, just index).
I see no error in elasticsearch logs, the only exception i get is inside my app :
Caused by: java.io.IOException: listener timeout after waiting for [30000] ms
at org.elasticsearch.client.RestClient$SyncResponseListener.get(RestClient.java:663)

I have only create operations.
The index and the mapping are created before executing the bulks, using the TransportClient (I didn't found how to create indices using the RestHighLevelClient, not sure if it's possible yet).

calculating this time in java, like comparing the current time from before and after runing client.bulk(bulkRequest), i get about 10 seconds

looking on bulkResponse.getTook(), i get max 1 second

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.