I'm doing some tests here, and before executing a bulk index in my cluster
(2 nodes, replica =1) I set the index_refresh_interval = -1.
I later connect to one of the nodes by using the transport client point to
host:9300.
I was expecting that only that node, would have the data inserted (since
I've set the interval to -1) Isn't this the case?
I'm trying to come up with a good solution to index a lot of data on live
environment without having to penalize the live nodes. Trying to synch all
data to one node first, and then let it replicate to the others. Any
advices on this?
But, explicit routing is the exception. Think of Elasticsearch of a
multi-node system that was invented to distribute the load over your nodes
automatically. No manual intervention required. You won't need to care
about indexing that can penalize search. If your load rises, add more
nodes. It's that easy.
Only for exceptional situations, you may consider shard placement control
and other techniques, move parts of an index, avoiding hot spots etc.
For baseline ingesting, just set up primary shards for faster data loading.
After ingesting is done, add a replica level to the shards.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.