I have been trying out ES for few weeks now and I am pretty convinced that
ES is what I would want to use for my project.
As I was exploring more, I have few doubts, it would be great if you can
help me here:
- I understand that shards are defined at index creation time and replicas
can be added later. So if I have a cluster where I start with 2 nodes with
3 shards and 1 replica and later add more nodes, it would get evenly
distributed in 6 nodes (1 shard per node). So basically 7th node is
meaningless unless I increase the replica. Am I correct with my
- If I get into a situation where I need to increase shards I can do that
for the indices that I would create from then by increasing the shard
number programatically, Say from now onwards all new indexes will have 5
shards and 1 replica. Is it a fair assumption that, if I add nodes beyond
6th one the shards of new indices will get distributed and the old indices
will stay as is with the initial 6 nodes?
- How a new cluster can help here? When should I use more than one cluster?
- I read that the distribution happens through primary node and looks like
primary node get selected based on startup sequence. If the primary node
fails does any of the existing node takes up the responsibilities of a
- Is there any benefit of running more than one node in a multi CPU
machine? Is is worth running more than one nodes with 5 shards and 1
replica in a EC2 large instance?
- If I don't use shared File system for permanent storage, rather backup
the files under /data folder from local storage, will I be able to recover
in case of a node failure just by using the backed up data?
- I plan to create new indexes after every 1 million docs, is there a
limitation on how many indices I can create? Is there any limitation on how
many indices I can do a search on without degrading performance in one
request (from api its evident that it takes an array of indices)?
Once again, hats off to Shay for creating such an awesome product and
supporting all alone!!
Thanks a lot