Scaling up


(anghelutar) #1

Hi everybody,

We have a fairly big cluster, which keeps growing every day.
My question is very simple: how do we know when to add a new node to
the cluster?

Thank you,
roxana


(anghelutar) #2

I realize the question is a bit vague. To elaborate more: we have
multiple indexes growing, but we also have new indexes added up to the
cluster.

Does anyone have experience with this?

roxana

On Apr 25, 10:20 am, anghelutar anghelu...@gmail.com wrote:

Hi everybody,

We have a fairly big cluster, which keeps growing every day.
My question is very simple: how do we know when to add a new node to
the cluster?

Thank you,
roxana


(Otis Gospodnetić) #3

Roxana,

How do you know when to add a new node was the original question.

Here are some signs:

  • When performance (e.g. query latency or throughput) starts to suffer
  • When you don't have enough disk space
  • When disk IO on existing nodes is at 100% and is slowing down indexing or
    searching
  • When shard(s) on a given node become too big for that node and you start
    running out of memory

You can see pretty much all these signs using a tool like SPM for
ES: http://sematext.com/spm/index.html

HTH,
Otis

On Wednesday, April 25, 2012 4:51:49 AM UTC-4, anghelutar wrote:

I realize the question is a bit vague. To elaborate more: we have
multiple indexes growing, but we also have new indexes added up to the
cluster.

Does anyone have experience with this?

roxana

On Apr 25, 10:20 am, anghelutar anghelu...@gmail.com wrote:

Hi everybody,

We have a fairly big cluster, which keeps growing every day.
My question is very simple: how do we know when to add a new node to
the cluster?

Thank you,
roxana


(anghelutar) #4

Thank you, Otis!

I was wondering actually whether it's possible to anticipate any of
these signs and to have an alerting system beforehand. The reason is
that it takes time to rebalance the cluster, during which things don't
work out smoothly.

I am specifically interested in avoiding the OOM errors, is there a
formula to estimate when we will hit them, function of the types of
queries, number of nodes, number of indexes and number
of shards per index?

Finally, what would be the best architecture for a system with
relatively few searches, but quite complicated?

roxana

On Apr 26, 5:35 am, Otis Gospodnetic otis.gospodne...@gmail.com
wrote:

Roxana,

How do you know when to add a new node was the original question.

Here are some signs:

  • When performance (e.g. query latency or throughput) starts to suffer
  • When you don't have enough disk space
  • When disk IO on existing nodes is at 100% and is slowing down indexing or
    searching
  • When shard(s) on a given node become too big for that node and you start
    running out of memory

You can see pretty much all these signs using a tool like SPM for
ES:http://sematext.com/spm/index.html

HTH,
Otis

On Wednesday, April 25, 2012 4:51:49 AM UTC-4, anghelutar wrote:

I realize the question is a bit vague. To elaborate more: we have
multiple indexes growing, but we also have new indexes added up to the
cluster.

Does anyone have experience with this?

roxana

On Apr 25, 10:20 am, anghelutar anghelu...@gmail.com wrote:

Hi everybody,

We have a fairly big cluster, which keeps growing every day.
My question is very simple: how do we know when to add a new node to
the cluster?

Thank you,
roxana


(Otis Gospodnetić) #5

Hello,

On Thursday, April 26, 2012 7:17:52 AM UTC-4, anghelutar wrote:

Thank you, Otis!

I was wondering actually whether it's possible to anticipate any of
these signs and to have an alerting system beforehand. The reason is
that it takes time to rebalance the cluster, during which things don't
work out smoothly.

It is. Have a look at SPM for ES. Alerts are baked in, though currently
hidden in the UI. But that would allow you to set various alert
rules/thresholds and be notified when those thresholds are reached. This
would in turn be the sign to start thinking about expansion.

I am specifically interested in avoiding the OOM errors, is there a
formula to estimate when we will hit them, function of the types of
queries, number of nodes, number of indexes and number
of shards per index?

There is nothing that I know of that is actually accurate. Lots of
variables.

Finally, what would be the best architecture for a system with
relatively few searches, but quite complicated?

Uh, that's hard to answer without knowing the details. The only thing I
could say with certainty is that you wouldn't need many replicas because of
the low query load and that you would probably want small shards if queries
are really complex to maximize parallelization and minimize latency of
individual shard queries.

Otis

On Apr 26, 5:35 am, Otis Gospodnetic otis.gospodne...@gmail.com
wrote:

Roxana,

How do you know when to add a new node was the original question.

Here are some signs:

  • When performance (e.g. query latency or throughput) starts to suffer
  • When you don't have enough disk space
  • When disk IO on existing nodes is at 100% and is slowing down indexing
    or
    searching
  • When shard(s) on a given node become too big for that node and you
    start
    running out of memory

You can see pretty much all these signs using a tool like SPM for
ES:http://sematext.com/spm/index.html

HTH,
Otis

On Wednesday, April 25, 2012 4:51:49 AM UTC-4, anghelutar wrote:

I realize the question is a bit vague. To elaborate more: we have
multiple indexes growing, but we also have new indexes added up to the
cluster.

Does anyone have experience with this?

roxana

On Apr 25, 10:20 am, anghelutar anghelu...@gmail.com wrote:

Hi everybody,

We have a fairly big cluster, which keeps growing every day.
My question is very simple: how do we know when to add a new node to
the cluster?

Thank you,
roxana


(Shay Banon) #6

Its not easy to estimate requires size, but the first thing you want to
keep an eye on is JVM heap usage (which will cause OOM). Since you have a
slowly evolving system, you can monitor it using the node stats API, and if
it reached ~85-90% of the memory and keep at it for an hour or so, its time
to add a node. You will have enough buffer time for relocation to happen
while the system is still ok.

On Thu, Apr 26, 2012 at 2:17 PM, anghelutar anghelutar@gmail.com wrote:

Thank you, Otis!

I was wondering actually whether it's possible to anticipate any of
these signs and to have an alerting system beforehand. The reason is
that it takes time to rebalance the cluster, during which things don't
work out smoothly.

I am specifically interested in avoiding the OOM errors, is there a
formula to estimate when we will hit them, function of the types of
queries, number of nodes, number of indexes and number
of shards per index?

Finally, what would be the best architecture for a system with
relatively few searches, but quite complicated?

roxana

On Apr 26, 5:35 am, Otis Gospodnetic otis.gospodne...@gmail.com
wrote:

Roxana,

How do you know when to add a new node was the original question.

Here are some signs:

  • When performance (e.g. query latency or throughput) starts to suffer
  • When you don't have enough disk space
  • When disk IO on existing nodes is at 100% and is slowing down indexing
    or
    searching
  • When shard(s) on a given node become too big for that node and you
    start
    running out of memory

You can see pretty much all these signs using a tool like SPM for
ES:http://sematext.com/spm/index.html

HTH,
Otis

On Wednesday, April 25, 2012 4:51:49 AM UTC-4, anghelutar wrote:

I realize the question is a bit vague. To elaborate more: we have
multiple indexes growing, but we also have new indexes added up to the
cluster.

Does anyone have experience with this?

roxana

On Apr 25, 10:20 am, anghelutar anghelu...@gmail.com wrote:

Hi everybody,

We have a fairly big cluster, which keeps growing every day.
My question is very simple: how do we know when to add a new node to
the cluster?

Thank you,
roxana


(anghelutar) #7

When issuing a query, is there any way to tell ES to limit searches to
only 1 server for each shard, so that the memory is consumed only on
that server? (so basically the other server is used only when the
first one is down)

Thanks a lot,
roxana

On Apr 29, 6:30 pm, Shay Banon kim...@gmail.com wrote:

Its not easy to estimate requires size, but the first thing you want to
keep an eye on is JVM heap usage (which will cause OOM). Since you have a
slowly evolving system, you can monitor it using the node stats API, and if
it reached ~85-90% of the memory and keep at it for an hour or so, its time
to add a node. You will have enough buffer time for relocation to happen
while the system is still ok.

On Thu, Apr 26, 2012 at 2:17 PM, anghelutar anghelu...@gmail.com wrote:

Thank you, Otis!

I was wondering actually whether it's possible to anticipate any of
these signs and to have an alerting system beforehand. The reason is
that it takes time to rebalance the cluster, during which things don't
work out smoothly.

I am specifically interested in avoiding the OOM errors, is there a
formula to estimate when we will hit them, function of the types of
queries, number of nodes, number of indexes and number
of shards per index?

Finally, what would be the best architecture for a system with
relatively few searches, but quite complicated?

roxana

On Apr 26, 5:35 am, Otis Gospodnetic otis.gospodne...@gmail.com
wrote:

Roxana,

How do you know when to add a new node was the original question.

Here are some signs:

  • When performance (e.g. query latency or throughput) starts to suffer
  • When you don't have enough disk space
  • When disk IO on existing nodes is at 100% and is slowing down indexing
    or
    searching
  • When shard(s) on a given node become too big for that node and you
    start
    running out of memory

You can see pretty much all these signs using a tool like SPM for
ES:http://sematext.com/spm/index.html

HTH,
Otis

On Wednesday, April 25, 2012 4:51:49 AM UTC-4, anghelutar wrote:

I realize the question is a bit vague. To elaborate more: we have
multiple indexes growing, but we also have new indexes added up to the
cluster.

Does anyone have experience with this?

roxana

On Apr 25, 10:20 am, anghelutar anghelu...@gmail.com wrote:

Hi everybody,

We have a fairly big cluster, which keeps growing every day.
My question is very simple: how do we know when to add a new node to
the cluster?

Thank you,
roxana


(Shay Banon) #8

You could set the preference in the search request to "_primary":
http://www.elasticsearch.org/guide/reference/api/search/preference.html.

On Tue, May 1, 2012 at 1:22 AM, anghelutar anghelutar@gmail.com wrote:

When issuing a query, is there any way to tell ES to limit searches to
only 1 server for each shard, so that the memory is consumed only on
that server? (so basically the other server is used only when the
first one is down)

Thanks a lot,
roxana

On Apr 29, 6:30 pm, Shay Banon kim...@gmail.com wrote:

Its not easy to estimate requires size, but the first thing you want to
keep an eye on is JVM heap usage (which will cause OOM). Since you have a
slowly evolving system, you can monitor it using the node stats API, and
if
it reached ~85-90% of the memory and keep at it for an hour or so, its
time
to add a node. You will have enough buffer time for relocation to happen
while the system is still ok.

On Thu, Apr 26, 2012 at 2:17 PM, anghelutar anghelu...@gmail.com
wrote:

Thank you, Otis!

I was wondering actually whether it's possible to anticipate any of
these signs and to have an alerting system beforehand. The reason is
that it takes time to rebalance the cluster, during which things don't
work out smoothly.

I am specifically interested in avoiding the OOM errors, is there a
formula to estimate when we will hit them, function of the types of
queries, number of nodes, number of indexes and number
of shards per index?

Finally, what would be the best architecture for a system with
relatively few searches, but quite complicated?

roxana

On Apr 26, 5:35 am, Otis Gospodnetic otis.gospodne...@gmail.com
wrote:

Roxana,

How do you know when to add a new node was the original question.

Here are some signs:

  • When performance (e.g. query latency or throughput) starts to
    suffer
  • When you don't have enough disk space
  • When disk IO on existing nodes is at 100% and is slowing down
    indexing

or

searching

  • When shard(s) on a given node become too big for that node and you
    start
    running out of memory

You can see pretty much all these signs using a tool like SPM for
ES:http://sematext.com/spm/index.html

HTH,
Otis

On Wednesday, April 25, 2012 4:51:49 AM UTC-4, anghelutar wrote:

I realize the question is a bit vague. To elaborate more: we have
multiple indexes growing, but we also have new indexes added up to
the

cluster.

Does anyone have experience with this?

roxana

On Apr 25, 10:20 am, anghelutar anghelu...@gmail.com wrote:

Hi everybody,

We have a fairly big cluster, which keeps growing every day.
My question is very simple: how do we know when to add a new
node to

the cluster?

Thank you,
roxana


(system) #9