I started to test adding and removing nodes from my cluster and testing the
performance but I am seeing a lot of the time while the inital data sync is
single threaded. Most of the time i see 1 core at 100% and every now and
then the other cores are getting some usage but its clear that 1 core is
working 3 to 4 times harder then the other cores?
There is no single threaded data sync. What you see is index recovery,
downloading shards from the gateway. You can modify the recovery settings
in configuration to minimize the recovery phases, by default, recovery is
placed in the background so it does not disturb ongoing saerch/index so
much.
There is no single threaded data sync. What you see is index recovery,
downloading shards from the gateway. You can modify the recovery settings
in configuration to minimize the recovery phases, by default, recovery is
placed in the background so it does not disturb ongoing saerch/index so
much.
Recovery is not per node, but per index. If a node starts up, it receives
messages what shards it shall house. Then, these shards are either
activated if locally present, or the shard is received from another node.
If an index is complete over all shards, the recovery of the index is
complete.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.