Few queries on setting up a high performing and scalable ES setup

Hi,

I have been trying out ES for few weeks now and I am pretty convinced that
ES is what I would want to use for my project.
As I was exploring more, I have few doubts, it would be great if you can
help me here:

  1. I understand that shards are defined at index creation time and replicas
    can be added later. So if I have a cluster where I start with 2 nodes with
    3 shards and 1 replica and later add more nodes, it would get evenly
    distributed in 6 nodes (1 shard per node). So basically 7th node is
    meaningless unless I increase the replica. Am I correct with my
    understanding?
  2. If I get into a situation where I need to increase shards I can do that
    for the indices that I would create from then by increasing the shard
    number programatically, Say from now onwards all new indexes will have 5
    shards and 1 replica. Is it a fair assumption that, if I add nodes beyond
    6th one the shards of new indices will get distributed and the old indices
    will stay as is with the initial 6 nodes?
  3. How a new cluster can help here? When should I use more than one cluster?
  4. I read that the distribution happens through primary node and looks like
    primary node get selected based on startup sequence. If the primary node
    fails does any of the existing node takes up the responsibilities of a
    primary node?
  5. Is there any benefit of running more than one node in a multi CPU
    machine? Is is worth running more than one nodes with 5 shards and 1
    replica in a EC2 large instance?
  6. If I don't use shared File system for permanent storage, rather backup
    the files under /data folder from local storage, will I be able to recover
    in case of a node failure just by using the backed up data?
  7. I plan to create new indexes after every 1 million docs, is there a
    limitation on how many indices I can create? Is there any limitation on how
    many indices I can do a search on without degrading performance in one
    request (from api its evident that it takes an array of indices)?

Once again, hats off to Shay for creating such an awesome product and
supporting all alone!!

Thanks a lot
Rahul

Hi Rahul,

1+2 -> have a look into the video on the homepage (below)

  1. How a new cluster can help here? When should I use more than one cluster?

e.g. if you have a lot servers in one network and you don't want that
they join each other e.g. for different customers or for development/
production separation

does any of the existing node takes up the responsibilities of a primary node?

yes

Is there any benefit of running more than one node in a multi CPU machine?

yes, you could use e.g. two 32bit instances which reduces every RAM
consumption a bit BUT I don't think that it will reduce overall RAM
usage.

Is is worth running more than one nodes with 5 shards and 1 replica in a EC2 large instance?

I don't think so

just by using the backed up data?

have a look, and do not forget to flush

Is there any limitation on how many indices I can do a search on without degrading performance in one request

Its successfully used from others with thousand of indices/shards but
it is not intended for e.g. one index per user ...

Regards,
Peter.

On Nov 17, 3:49 pm, Rahul Sharma rahul.sharma.co...@gmail.com wrote:

Hi,

I have been trying out ES for few weeks now and I am pretty convinced that
ES is what I would want to use for my project.
As I was exploring more, I have few doubts, it would be great if you can
help me here:

  1. I understand that shards are defined at index creation time and replicas
    can be added later. So if I have a cluster where I start with 2 nodes with
    3 shards and 1 replica and later add more nodes, it would get evenly
    distributed in 6 nodes (1 shard per node). So basically 7th node is
    meaningless unless I increase the replica. Am I correct with my
    understanding?
  2. If I get into a situation where I need to increase shards I can do that
    for the indices that I would create from then by increasing the shard
    number programatically, Say from now onwards all new indexes will have 5
    shards and 1 replica. Is it a fair assumption that, if I add nodes beyond
    6th one the shards of new indices will get distributed and the old indices
    will stay as is with the initial 6 nodes?
  3. How a new cluster can help here? When should I use more than one cluster?
  4. I read that the distribution happens through primary node and looks like
    primary node get selected based on startup sequence. If the primary node
    fails does any of the existing node takes up the responsibilities of a
    primary node?
  5. Is there any benefit of running more than one node in a multi CPU
    machine? Is is worth running more than one nodes with 5 shards and 1
    replica in a EC2 large instance?
  6. If I don't use shared File system for permanent storage, rather backup
    the files under /data folder from local storage, will I be able to recover
    in case of a node failure just by using the backed up data?
  7. I plan to create new indexes after every 1 million docs, is there a
    limitation on how many indices I can create? Is there any limitation on how
    many indices I can do a search on without degrading performance in one
    request (from api its evident that it takes an array of indices)?

Once again, hats off to Shay for creating such an awesome product and
supporting all alone!!

Thanks a lot
Rahul

Thanks Peter for the answers!!

On Fri, Nov 18, 2011 at 2:14 AM, Karussell tableyourtime@googlemail.comwrote:

Hi Rahul,

1+2 -> have a look into the video on the homepage (below)

  1. How a new cluster can help here? When should I use more than one
    cluster?

e.g. if you have a lot servers in one network and you don't want that
they join each other e.g. for different customers or for development/
production separation

does any of the existing node takes up the responsibilities of a primary
node?

yes

Is there any benefit of running more than one node in a multi CPU
machine?

yes, you could use e.g. two 32bit instances which reduces every RAM
consumption a bit BUT I don't think that it will reduce overall RAM
usage.

Is is worth running more than one nodes with 5 shards and 1 replica in a
EC2 large instance?

I don't think so

just by using the backed up data?

have a look, and do not forget to flush

Jetslide uses ElasticSearch as Database | Karussell

Is there any limitation on how many indices I can do a search on without
degrading performance in one request

Its successfully used from others with thousand of indices/shards but
it is not intended for e.g. one index per user ...

Regards,
Peter.

On Nov 17, 3:49 pm, Rahul Sharma rahul.sharma.co...@gmail.com wrote:

Hi,

I have been trying out ES for few weeks now and I am pretty convinced
that
ES is what I would want to use for my project.
As I was exploring more, I have few doubts, it would be great if you can
help me here:

  1. I understand that shards are defined at index creation time and
    replicas
    can be added later. So if I have a cluster where I start with 2 nodes
    with
    3 shards and 1 replica and later add more nodes, it would get evenly
    distributed in 6 nodes (1 shard per node). So basically 7th node is
    meaningless unless I increase the replica. Am I correct with my
    understanding?
  2. If I get into a situation where I need to increase shards I can do
    that
    for the indices that I would create from then by increasing the shard
    number programatically, Say from now onwards all new indexes will have 5
    shards and 1 replica. Is it a fair assumption that, if I add nodes beyond
    6th one the shards of new indices will get distributed and the old
    indices
    will stay as is with the initial 6 nodes?
  3. How a new cluster can help here? When should I use more than one
    cluster?
  4. I read that the distribution happens through primary node and looks
    like
    primary node get selected based on startup sequence. If the primary node
    fails does any of the existing node takes up the responsibilities of a
    primary node?
  5. Is there any benefit of running more than one node in a multi CPU
    machine? Is is worth running more than one nodes with 5 shards and 1
    replica in a EC2 large instance?
  6. If I don't use shared File system for permanent storage, rather backup
    the files under /data folder from local storage, will I be able to
    recover
    in case of a node failure just by using the backed up data?
  7. I plan to create new indexes after every 1 million docs, is there a
    limitation on how many indices I can create? Is there any limitation on
    how
    many indices I can do a search on without degrading performance in one
    request (from api its evident that it takes an array of indices)?

Once again, hats off to Shay for creating such an awesome product and
supporting all alone!!

Thanks a lot
Rahul