Java Bulk indexing API performances


(cnotin) #1

Hello there,

I'm getting myself into the Java API which is pretty hard for me (I find it's lacking some documentation and examples).
Based on what I read I wrote a little application which tries to inject many randomly generated documents (the mapping is always the same but the values are random) as fast as possible.
You can get the interesting part of the code here : https://gist.github.com/fdd620f6026e02c081e0

The performances aren't very good (I'm using 5 shards and 1 replicas with 2 computers). Am I doing it right ?

Notes :

  • I tried to tweak the settings with setRefresh and setReplicationType when I bulkRequest.add() and bulkRequest.execute() but without sensible improvement (and I'm not sure about their meaning for both API calls)
  • I tried to use bulkRequest.execute() with and without actionGet() and bulkResponse. I'm not sure as well about the meaning of this but what I think is that when I use actionGet() my app wait for the end of the indexing whereas without it the application ends faster but the nodes are still working. Correct me if I'm wrong please.

I would appreciate some help.

Best regards,
Clément.


(cnotin) #2

I read http://elasticsearch-users.115913.n3.nabble.com/Java-Bulk-API-amp-creating-index-td2571056.html and I have 1 more question...

You can also do brb.add(Requests.indexRequest(...)). What's the difference between this and client.prepareIndex() ?

(cnotin) #3

For one question I can reply to myself about actionGet(), it seems that execute() returns and ActionFuture which has actionGet() (cf : http://www.thelastcitadel.com/lab/elasticsearch/javadoc/org/elasticsearch/action/support/AdapterActionFuture.html#actionGet%28%29) which is related to java's Future which has a get() http://download.oracle.com/javase/1,5.0/docs/api/java/util/concurrent/Future.html#get%28%29 used to wait for the asynchronous operation to finish.


(system) #4