Few queries on setting up a high performing and scalable ES setup

Rahul_Sharma · November 17, 2011, 2:49pm

Hi,

I have been trying out ES for few weeks now and I am pretty convinced that
ES is what I would want to use for my project.
As I was exploring more, I have few doubts, it would be great if you can
help me here:

I understand that shards are defined at index creation time and replicas
can be added later. So if I have a cluster where I start with 2 nodes with
3 shards and 1 replica and later add more nodes, it would get evenly
distributed in 6 nodes (1 shard per node). So basically 7th node is
meaningless unless I increase the replica. Am I correct with my
understanding?
If I get into a situation where I need to increase shards I can do that
for the indices that I would create from then by increasing the shard
number programatically, Say from now onwards all new indexes will have 5
shards and 1 replica. Is it a fair assumption that, if I add nodes beyond
6th one the shards of new indices will get distributed and the old indices
will stay as is with the initial 6 nodes?
How a new cluster can help here? When should I use more than one cluster?
I read that the distribution happens through primary node and looks like
primary node get selected based on startup sequence. If the primary node
fails does any of the existing node takes up the responsibilities of a
primary node?
Is there any benefit of running more than one node in a multi CPU
machine? Is is worth running more than one nodes with 5 shards and 1
replica in a EC2 large instance?
If I don't use shared File system for permanent storage, rather backup
the files under /data folder from local storage, will I be able to recover
in case of a node failure just by using the backed up data?
I plan to create new indexes after every 1 million docs, is there a
limitation on how many indices I can create? Is there any limitation on how
many indices I can do a search on without degrading performance in one
request (from api its evident that it takes an array of indices)?

Once again, hats off to Shay for creating such an awesome product and
supporting all alone!!

Thanks a lot
Rahul

Karussell1 · November 17, 2011, 8:44pm

Hi Rahul,

1+2 -> have a look into the video on the homepage (below)

How a new cluster can help here? When should I use more than one cluster?

e.g. if you have a lot servers in one network and you don't want that
they join each other e.g. for different customers or for development/
production separation

does any of the existing node takes up the responsibilities of a primary node?

yes

Is there any benefit of running more than one node in a multi CPU machine?

yes, you could use e.g. two 32bit instances which reduces every RAM
consumption a bit BUT I don't think that it will reduce overall RAM
usage.

Is is worth running more than one nodes with 5 shards and 1 replica in a EC2 large instance?

I don't think so

just by using the backed up data?

have a look, and do not forget to flush

Is there any limitation on how many indices I can do a search on without degrading performance in one request

Its successfully used from others with thousand of indices/shards but
it is not intended for e.g. one index per user ...

Regards,
Peter.

On Nov 17, 3:49 pm, Rahul Sharma rahul.sharma.co...@gmail.com wrote:

Hi,

I have been trying out ES for few weeks now and I am pretty convinced that
ES is what I would want to use for my project.
As I was exploring more, I have few doubts, it would be great if you can
help me here:

I understand that shards are defined at index creation time and replicas
can be added later. So if I have a cluster where I start with 2 nodes with
3 shards and 1 replica and later add more nodes, it would get evenly
distributed in 6 nodes (1 shard per node). So basically 7th node is
meaningless unless I increase the replica. Am I correct with my
understanding?

If I get into a situation where I need to increase shards I can do that
for the indices that I would create from then by increasing the shard
number programatically, Say from now onwards all new indexes will have 5
shards and 1 replica. Is it a fair assumption that, if I add nodes beyond
6th one the shards of new indices will get distributed and the old indices
will stay as is with the initial 6 nodes?

How a new cluster can help here? When should I use more than one cluster?

I read that the distribution happens through primary node and looks like
primary node get selected based on startup sequence. If the primary node
fails does any of the existing node takes up the responsibilities of a
primary node?

Is there any benefit of running more than one node in a multi CPU
machine? Is is worth running more than one nodes with 5 shards and 1
replica in a EC2 large instance?

If I don't use shared File system for permanent storage, rather backup
the files under /data folder from local storage, will I be able to recover
in case of a node failure just by using the backed up data?

I plan to create new indexes after every 1 million docs, is there a
limitation on how many indices I can create? Is there any limitation on how
many indices I can do a search on without degrading performance in one
request (from api its evident that it takes an array of indices)?

Once again, hats off to Shay for creating such an awesome product and
supporting all alone!!

Thanks a lot
Rahul

Rahul_Sharma · November 21, 2011, 9:42pm

Thanks Peter for the answers!!

On Fri, Nov 18, 2011 at 2:14 AM, Karussell tableyourtime@googlemail.comwrote:

Hi Rahul,

1+2 -> have a look into the video on the homepage (below)

How a new cluster can help here? When should I use more than one
cluster?

e.g. if you have a lot servers in one network and you don't want that
they join each other e.g. for different customers or for development/
production separation

does any of the existing node takes up the responsibilities of a primary
node?

yes

Is there any benefit of running more than one node in a multi CPU
machine?

yes, you could use e.g. two 32bit instances which reduces every RAM
consumption a bit BUT I don't think that it will reduce overall RAM
usage.

Is is worth running more than one nodes with 5 shards and 1 replica in a
EC2 large instance?

I don't think so

just by using the backed up data?

have a look, and do not forget to flush

Jetslide uses ElasticSearch as Database | Karussell

Is there any limitation on how many indices I can do a search on without
degrading performance in one request

Its successfully used from others with thousand of indices/shards but
it is not intended for e.g. one index per user ...

Regards,
Peter.

On Nov 17, 3:49 pm, Rahul Sharma rahul.sharma.co...@gmail.com wrote:

Hi,

I have been trying out ES for few weeks now and I am pretty convinced
that
ES is what I would want to use for my project.
As I was exploring more, I have few doubts, it would be great if you can
help me here:

I understand that shards are defined at index creation time and
replicas
can be added later. So if I have a cluster where I start with 2 nodes
with
3 shards and 1 replica and later add more nodes, it would get evenly
distributed in 6 nodes (1 shard per node). So basically 7th node is
meaningless unless I increase the replica. Am I correct with my
understanding?

If I get into a situation where I need to increase shards I can do
that
for the indices that I would create from then by increasing the shard
number programatically, Say from now onwards all new indexes will have 5
shards and 1 replica. Is it a fair assumption that, if I add nodes beyond
6th one the shards of new indices will get distributed and the old
indices
will stay as is with the initial 6 nodes?

How a new cluster can help here? When should I use more than one
cluster?

I read that the distribution happens through primary node and looks
like
primary node get selected based on startup sequence. If the primary node
fails does any of the existing node takes up the responsibilities of a
primary node?

Is there any benefit of running more than one node in a multi CPU
machine? Is is worth running more than one nodes with 5 shards and 1
replica in a EC2 large instance?

If I don't use shared File system for permanent storage, rather backup
the files under /data folder from local storage, will I be able to
recover
in case of a node failure just by using the backed up data?

I plan to create new indexes after every 1 million docs, is there a
limitation on how many indices I can create? Is there any limitation on
how
many indices I can do a search on without degrading performance in one
request (from api its evident that it takes an array of indices)?

Once again, hats off to Shay for creating such an awesome product and
supporting all alone!!

Thanks a lot
Rahul

Topic		Replies	Views
Replica and non-replica shards? Elasticsearch	7	1139	July 5, 2017
Shards and replicas Elasticsearch	16	1384	July 6, 2017
When do you need more then 1 shard? Elasticsearch	12	1853	July 6, 2017
Sharding and Performance Elasticsearch	1	310	August 29, 2018
Help a newb understand node distribution Elasticsearch	3	374	July 6, 2017

Few queries on setting up a high performing and scalable ES setup

Related topics