I have been trying out ES for few weeks now and I am pretty convinced that
ES is what I would want to use for my project.
As I was exploring more, I have few doubts, it would be great if you can
help me here:
I understand that shards are defined at index creation time and replicas
can be added later. So if I have a cluster where I start with 2 nodes with
3 shards and 1 replica and later add more nodes, it would get evenly
distributed in 6 nodes (1 shard per node). So basically 7th node is
meaningless unless I increase the replica. Am I correct with my
understanding?
If I get into a situation where I need to increase shards I can do that
for the indices that I would create from then by increasing the shard
number programatically, Say from now onwards all new indexes will have 5
shards and 1 replica. Is it a fair assumption that, if I add nodes beyond
6th one the shards of new indices will get distributed and the old indices
will stay as is with the initial 6 nodes?
How a new cluster can help here? When should I use more than one cluster?
I read that the distribution happens through primary node and looks like
primary node get selected based on startup sequence. If the primary node
fails does any of the existing node takes up the responsibilities of a
primary node?
Is there any benefit of running more than one node in a multi CPU
machine? Is is worth running more than one nodes with 5 shards and 1
replica in a EC2 large instance?
If I don't use shared File system for permanent storage, rather backup
the files under /data folder from local storage, will I be able to recover
in case of a node failure just by using the backed up data?
I plan to create new indexes after every 1 million docs, is there a
limitation on how many indices I can create? Is there any limitation on how
many indices I can do a search on without degrading performance in one
request (from api its evident that it takes an array of indices)?
Once again, hats off to Shay for creating such an awesome product and
supporting all alone!!
1+2 -> have a look into the video on the homepage (below)
How a new cluster can help here? When should I use more than one cluster?
e.g. if you have a lot servers in one network and you don't want that
they join each other e.g. for different customers or for development/
production separation
does any of the existing node takes up the responsibilities of a primary node?
yes
Is there any benefit of running more than one node in a multi CPU machine?
yes, you could use e.g. two 32bit instances which reduces every RAM
consumption a bit BUT I don't think that it will reduce overall RAM
usage.
Is is worth running more than one nodes with 5 shards and 1 replica in a EC2 large instance?
I don't think so
just by using the backed up data?
have a look, and do not forget to flush
Is there any limitation on how many indices I can do a search on without degrading performance in one request
Its successfully used from others with thousand of indices/shards but
it is not intended for e.g. one index per user ...
I have been trying out ES for few weeks now and I am pretty convinced that
ES is what I would want to use for my project.
As I was exploring more, I have few doubts, it would be great if you can
help me here:
I understand that shards are defined at index creation time and replicas
can be added later. So if I have a cluster where I start with 2 nodes with
3 shards and 1 replica and later add more nodes, it would get evenly
distributed in 6 nodes (1 shard per node). So basically 7th node is
meaningless unless I increase the replica. Am I correct with my
understanding?
If I get into a situation where I need to increase shards I can do that
for the indices that I would create from then by increasing the shard
number programatically, Say from now onwards all new indexes will have 5
shards and 1 replica. Is it a fair assumption that, if I add nodes beyond
6th one the shards of new indices will get distributed and the old indices
will stay as is with the initial 6 nodes?
How a new cluster can help here? When should I use more than one cluster?
I read that the distribution happens through primary node and looks like
primary node get selected based on startup sequence. If the primary node
fails does any of the existing node takes up the responsibilities of a
primary node?
Is there any benefit of running more than one node in a multi CPU
machine? Is is worth running more than one nodes with 5 shards and 1
replica in a EC2 large instance?
If I don't use shared File system for permanent storage, rather backup
the files under /data folder from local storage, will I be able to recover
in case of a node failure just by using the backed up data?
I plan to create new indexes after every 1 million docs, is there a
limitation on how many indices I can create? Is there any limitation on how
many indices I can do a search on without degrading performance in one
request (from api its evident that it takes an array of indices)?
Once again, hats off to Shay for creating such an awesome product and
supporting all alone!!
1+2 -> have a look into the video on the homepage (below)
How a new cluster can help here? When should I use more than one
cluster?
e.g. if you have a lot servers in one network and you don't want that
they join each other e.g. for different customers or for development/
production separation
does any of the existing node takes up the responsibilities of a primary
node?
yes
Is there any benefit of running more than one node in a multi CPU
machine?
yes, you could use e.g. two 32bit instances which reduces every RAM
consumption a bit BUT I don't think that it will reduce overall RAM
usage.
Is is worth running more than one nodes with 5 shards and 1 replica in a
EC2 large instance?
I have been trying out ES for few weeks now and I am pretty convinced
that
ES is what I would want to use for my project.
As I was exploring more, I have few doubts, it would be great if you can
help me here:
I understand that shards are defined at index creation time and
replicas
can be added later. So if I have a cluster where I start with 2 nodes
with
3 shards and 1 replica and later add more nodes, it would get evenly
distributed in 6 nodes (1 shard per node). So basically 7th node is
meaningless unless I increase the replica. Am I correct with my
understanding?
If I get into a situation where I need to increase shards I can do
that
for the indices that I would create from then by increasing the shard
number programatically, Say from now onwards all new indexes will have 5
shards and 1 replica. Is it a fair assumption that, if I add nodes beyond
6th one the shards of new indices will get distributed and the old
indices
will stay as is with the initial 6 nodes?
How a new cluster can help here? When should I use more than one
cluster?
I read that the distribution happens through primary node and looks
like
primary node get selected based on startup sequence. If the primary node
fails does any of the existing node takes up the responsibilities of a
primary node?
Is there any benefit of running more than one node in a multi CPU
machine? Is is worth running more than one nodes with 5 shards and 1
replica in a EC2 large instance?
If I don't use shared File system for permanent storage, rather backup
the files under /data folder from local storage, will I be able to
recover
in case of a node failure just by using the backed up data?
I plan to create new indexes after every 1 million docs, is there a
limitation on how many indices I can create? Is there any limitation on
how
many indices I can do a search on without degrading performance in one
request (from api its evident that it takes an array of indices)?
Once again, hats off to Shay for creating such an awesome product and
supporting all alone!!
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.