We have an ES cluster setup with 5 nodes. There is one index divided into 5
shards, and each shard has 4 replicas. Hence all 5 nodes have 5 shards
each, and all of them have the whole index.
From what I have seen, indexing is a CPU intensive operation. I would like
the indexing to happen only one machine (which I would not include behind
my production load balancer to serve read queries) and then the replication
of indices to happen to the other machines.
Is this possible? Can I limit indexing to just one machine and specify
which machine that should be?
You need to have all the primaries on the same node for this to happen.
I'd suggest you reduce your replica count as that will be adding overhead
to things, even though replicas are handled in parallel, you have an
excessive amount of them.
We have an ES cluster setup with 5 nodes. There is one index divided into
5 shards, and each shard has 4 replicas. Hence all 5 nodes have 5 shards
each, and all of them have the whole index.
From what I have seen, indexing is a CPU intensive operation. I would like
the indexing to happen only one machine (which I would not include behind
my production load balancer to serve read queries) and then the replication
of indices to happen to the other machines.
Is this possible? Can I limit indexing to just one machine and specify
which machine that should be?
Is there a way to have all primaries on one machine?
As for having too many replicas, I currently have a setup where all read
queries can be served by any machine, without having to go to any other
machine to look for missing shards. My index size is 4 GB and stays in RAM.
My data size on disk is 1.1 GB. Do you still recommend reducing replicas?
Would that not add the overhead of read queries needing to wait for network
IO to fetch records from other machines from time to time?
You need to have all the primaries on the same node for this to happen.
I'd suggest you reduce your replica count as that will be adding overhead
to things, even though replicas are handled in parallel, you have an
excessive amount of them.
We have an ES cluster setup with 5 nodes. There is one index divided into
5 shards, and each shard has 4 replicas. Hence all 5 nodes have 5 shards
each, and all of them have the whole index.
From what I have seen, indexing is a CPU intensive operation. I would
like the indexing to happen only one machine (which I would not include
behind my production load balancer to serve read queries) and then the
replication of indices to happen to the other machines.
Is this possible? Can I limit indexing to just one machine and specify
which machine that should be?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.