Async replication deprecated


#1

Hello Elastic Search experts,

If anyone could help to confirm, it will be great,

https://github.com/elastic/elasticsearch/issues/10114?

thanks in advance,
Lin


Best way to bulk insert?
(Mark Walkom) #2

Yes it has been,


#3

Thanks Mark,

Then what is the new suggested method?

BTW, I want to improve bulk insert performance, and I think asyn replicate is a good solution?

regards,
Lin


(Jörg Prante) #4

Asynchronous replication is not related to performance. You do not need this. It is good that it is deprecated, and it was never active by default. Many confuse this with faster replication or write consistency, but all it is good for is sending early responses of nodes with replica shards, ignoring the answers from them for API evaluation.

It is simpler to disable replicas by setting replication level to 0 at the begin of bulk indexing and increase it after bulk indexing per cluster settings update, plus setting index.refresh.interval to -1 while bulk indexing. This gives a lot more performance than replication at indexing time.

For best performance, i.e. dynamically matching the power of your servers in the cluster, use Java API BulkProcessor and tune action number per request and concurrency level. Do not forget to evaluate the bulk request responses, and continue only if bulk requests succeeded.


#5

Thanks Jörg,

  1. By setting index.refresh.interval to -1, it means never index new document? And I need to set it back after bulk insert?
  2. For your comments, "tune action number per request and concurrency level", do you mean how many documents I send for each bulk insert call to index? And how many threads running concurrently to bulk insert at CLIENT (not Elastic server side) bulk insert machine?

regards,
Lin


(Mark Walkom) #6

1 - Yep!
2 - Also yes :slight_smile: You need to test to see how many threads and what bulk request size work best for you.


#7

Thank you Mark, have a good weekend.


(system) #8