Hi,
Is there a way to delay writing to replicas while indexing?
When indexing data continuously, can we reply to the client after writing only to the primary shard, and replica can be written after a while?
Things to note:
Replicas cannot be reduced to zero. Since data is continuously coming (not initial data population)
I've looked into index.translog.durability: async and increasing the index.translog.sync_interval
Another question is does increasing index.translog.sync_interval means it will put data in the primary shard (node) cache and flush to disk for both primary and replica after this interval, or will data be stored in the cache of primary and replica nodes?
That is exactly the default behavior of elasticsearch on index operations.
See Here
By default, write operations only wait for the primary shards to be active before proceeding (i.e. wait_for_active_shards=1 ). This default can be overridden in the index settings dynamically by setting index.write.wait_for_active_shards . To alter this behavior per operation, the wait_for_active_shards request parameter can be used.
Now that IS different that setting replicas to 0 while indexing...
The difference is that resources will still be used to replicate the data to the replica shards
That is why sometimes it makes sense to set replicas to 0 when indexing... perhaps that is what you are thinking of...
But then you say this...
Which I do not understand ... data coming in continuously is not a requirement to having replicas...
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.