Shard Initialization slow down

Paul_5 · May 13, 2014, 9:59am

We are seeing a slow down in shard initialization speed as the number of
shards/indices grows in our cluster.

With 0-100's of indices/shards existing in the cluster a new bulk creation
of indices up the 100's at a time is fine, we see them pass through the
states and get a green cluster in a reasonable amount of time.

As the total cluster size grows to 1000+ indices (3000+ shards) we begin to
notice that the first rounds of initialization take longer to process, it
seems to speed up after the first few batches, but this slow down leads to "failed
to process cluster event (create-index [index_1112], cause [auto(bulk
api)]) within 30s" type messages in the Master logs - the indices are
eventually created.

Has anyone else experienced this? (did you find the cause / way to fix?)

Is this somewhat expected behaviour? - are we approaching something
incorrectly? (there are 3 data nodes involved, with 3 shards per index)

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f34157df-b34e-4d69-a8bd-d8cffb2e5667%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

warkolm · May 13, 2014, 10:02am

Sounds like the inevitable "add more nodes" situation.

How much RAM on each node, how big is your data set?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 13 May 2014 19:59, Paul codiverse@gmail.com wrote:

We are seeing a slow down in shard initialization speed as the number of
shards/indices grows in our cluster.

With 0-100's of indices/shards existing in the cluster a new bulk creation
of indices up the 100's at a time is fine, we see them pass through the
states and get a green cluster in a reasonable amount of time.

As the total cluster size grows to 1000+ indices (3000+ shards) we begin
to notice that the first rounds of initialization take longer to process,
it seems to speed up after the first few batches, but this slow down leads
to "failed to process cluster event (create-index [index_1112], cause
[auto(bulk api)]) within 30s" type messages in the Master logs - the
indices are eventually created.

Has anyone else experienced this? (did you find the cause / way to fix?)

Is this somewhat expected behaviour? - are we approaching something
incorrectly? (there are 3 data nodes involved, with 3 shards per index)

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/f34157df-b34e-4d69-a8bd-d8cffb2e5667%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/f34157df-b34e-4d69-a8bd-d8cffb2e5667%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624bazPsFZH5BtX8L1HCKEiP9_jz9_YLKOPhG%2BcES6DhQSQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Paul_5 · May 13, 2014, 10:16am

In testing and replicating the issue, this slow down has been seen
occurring with empty indices.

The running cluster is at present ~100 GB across 2,200 Indices with a total
of 13,500 shards and ~430,000,000 documents.

We have 7GB RAM and 5GB heap on the data nodes - haven't looked overly
carefully but don't think the heap is maxing out on any of the nodes when
this occurs.

Thanks,

Paul.

On Tuesday, May 13, 2014 11:02:32 AM UTC+1, Mark Walkom wrote:

Sounds like the inevitable "add more nodes" situation.

How much RAM on each node, how big is your data set?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com <javascript:>
web: www.campaignmonitor.com

On 13 May 2014 19:59, Paul <codi...@gmail.com <javascript:>> wrote:

We are seeing a slow down in shard initialization speed as the number of
shards/indices grows in our cluster.

With 0-100's of indices/shards existing in the cluster a new bulk
creation of indices up the 100's at a time is fine, we see them pass
through the states and get a green cluster in a reasonable amount of time.

As the total cluster size grows to 1000+ indices (3000+ shards) we begin
to notice that the first rounds of initialization take longer to process,
it seems to speed up after the first few batches, but this slow down leads
to "failed to process cluster event (create-index [index_1112], cause
[auto(bulk api)]) within 30s" type messages in the Master logs - the
indices are eventually created.

Has anyone else experienced this? (did you find the cause / way to fix?)

Is this somewhat expected behaviour? - are we approaching something
incorrectly? (there are 3 data nodes involved, with 3 shards per index)

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/f34157df-b34e-4d69-a8bd-d8cffb2e5667%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/f34157df-b34e-4d69-a8bd-d8cffb2e5667%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8bca4439-5c70-48b6-b5bd-45631e0a5fb2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

warkolm · May 13, 2014, 10:24am

Empty or not, there is still metadata that ES needs to maintain in the
cluster state. So the more indexes you have open the bigger that is and the
more resources required to track it.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 13 May 2014 20:16, Paul codiverse@gmail.com wrote:

In testing and replicating the issue, this slow down has been seen
occurring with empty indices.

The running cluster is at present ~100 GB across 2,200 Indices with a
total of 13,500 shards and ~430,000,000 documents.

We have 7GB RAM and 5GB heap on the data nodes - haven't looked overly
carefully but don't think the heap is maxing out on any of the nodes when
this occurs.

Thanks,

Paul.

On Tuesday, May 13, 2014 11:02:32 AM UTC+1, Mark Walkom wrote:

Sounds like the inevitable "add more nodes" situation.

How much RAM on each node, how big is your data set?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 13 May 2014 19:59, Paul codi...@gmail.com wrote:

We are seeing a slow down in shard initialization speed as the number of
shards/indices grows in our cluster.

With 0-100's of indices/shards existing in the cluster a new bulk
creation of indices up the 100's at a time is fine, we see them pass
through the states and get a green cluster in a reasonable amount of time.

As the total cluster size grows to 1000+ indices (3000+ shards) we begin
to notice that the first rounds of initialization take longer to process,
it seems to speed up after the first few batches, but this slow down leads
to "failed to process cluster event (create-index [index_1112], cause
[auto(bulk api)]) within 30s" type messages in the Master logs - the
indices are eventually created.

Has anyone else experienced this? (did you find the cause / way to fix?)

Is this somewhat expected behaviour? - are we approaching something
incorrectly? (there are 3 data nodes involved, with 3 shards per index)

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/f34157df-b34e-4d69-a8bd-d8cffb2e5667%
40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/f34157df-b34e-4d69-a8bd-d8cffb2e5667%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/8bca4439-5c70-48b6-b5bd-45631e0a5fb2%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/8bca4439-5c70-48b6-b5bd-45631e0a5fb2%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624beqCRUW%2BRL2%3DKE%2BqAKtnpPa4ADOj7zoTWQSO0HE8W%3Dfg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Paul_5 · May 13, 2014, 10:29am

Ok, do you know if there are clear indicators when limits are being reached?

We don't see errors in the logs (apart from the 30s timeout) but if there
are system or ES provided metrics that we can track to know when we need to
scale it would be really useful.

Thanks,

Paul.

On Tuesday, May 13, 2014 11:24:06 AM UTC+1, Mark Walkom wrote:

Empty or not, there is still metadata that ES needs to maintain in the
cluster state. So the more indexes you have open the bigger that is and the
more resources required to track it.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com <javascript:>
web: www.campaignmonitor.com

On 13 May 2014 20:16, Paul <codi...@gmail.com <javascript:>> wrote:

In testing and replicating the issue, this slow down has been seen
occurring with empty indices.

The running cluster is at present ~100 GB across 2,200 Indices with a
total of 13,500 shards and ~430,000,000 documents.

We have 7GB RAM and 5GB heap on the data nodes - haven't looked overly
carefully but don't think the heap is maxing out on any of the nodes when
this occurs.

Thanks,

Paul.

On Tuesday, May 13, 2014 11:02:32 AM UTC+1, Mark Walkom wrote:

Sounds like the inevitable "add more nodes" situation.

How much RAM on each node, how big is your data set?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 13 May 2014 19:59, Paul codi...@gmail.com wrote:

We are seeing a slow down in shard initialization speed as the number
of shards/indices grows in our cluster.

With 0-100's of indices/shards existing in the cluster a new bulk
creation of indices up the 100's at a time is fine, we see them pass
through the states and get a green cluster in a reasonable amount of time.

As the total cluster size grows to 1000+ indices (3000+ shards) we
begin to notice that the first rounds of initialization take longer to
process, it seems to speed up after the first few batches, but this slow
down leads to "failed to process cluster event (create-index
[index_1112], cause [auto(bulk api)]) within 30s" type messages in the
Master logs - the indices are eventually created.

Has anyone else experienced this? (did you find the cause / way to fix?)

Is this somewhat expected behaviour? - are we approaching something
incorrectly? (there are 3 data nodes involved, with 3 shards per index)

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/f34157df-b34e-4d69-a8bd-d8cffb2e5667%
40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/f34157df-b34e-4d69-a8bd-d8cffb2e5667%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/8bca4439-5c70-48b6-b5bd-45631e0a5fb2%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/8bca4439-5c70-48b6-b5bd-45631e0a5fb2%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9e52d337-7b5d-411b-904d-477c0806f99d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

warkolm · May 13, 2014, 10:34am

You will want to obtain Marvel (
Elasticsearch Platform — Find real-time answers at scale | Elastic) and then wait till
you have a history and start digging.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 13 May 2014 20:29, Paul codiverse@gmail.com wrote:

Ok, do you know if there are clear indicators when limits are being
reached?

We don't see errors in the logs (apart from the 30s timeout) but if there
are system or ES provided metrics that we can track to know when we need to
scale it would be really useful.

Thanks,

Paul.

On Tuesday, May 13, 2014 11:24:06 AM UTC+1, Mark Walkom wrote:

Empty or not, there is still metadata that ES needs to maintain in the
cluster state. So the more indexes you have open the bigger that is and the
more resources required to track it.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 13 May 2014 20:16, Paul codi...@gmail.com wrote:

In testing and replicating the issue, this slow down has been seen
occurring with empty indices.

The running cluster is at present ~100 GB across 2,200 Indices with a
total of 13,500 shards and ~430,000,000 documents.

We have 7GB RAM and 5GB heap on the data nodes - haven't looked overly
carefully but don't think the heap is maxing out on any of the nodes when
this occurs.

Thanks,

Paul.

On Tuesday, May 13, 2014 11:02:32 AM UTC+1, Mark Walkom wrote:

Sounds like the inevitable "add more nodes" situation.

How much RAM on each node, how big is your data set?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 13 May 2014 19:59, Paul codi...@gmail.com wrote:

We are seeing a slow down in shard initialization speed as the number
of shards/indices grows in our cluster.

With 0-100's of indices/shards existing in the cluster a new bulk
creation of indices up the 100's at a time is fine, we see them pass
through the states and get a green cluster in a reasonable amount of time.

As the total cluster size grows to 1000+ indices (3000+ shards) we
begin to notice that the first rounds of initialization take longer to
process, it seems to speed up after the first few batches, but this slow
down leads to "failed to process cluster event (create-index
[index_1112], cause [auto(bulk api)]) within 30s" type messages in the
Master logs - the indices are eventually created.

Has anyone else experienced this? (did you find the cause / way to
fix?)

Is this somewhat expected behaviour? - are we approaching something
incorrectly? (there are 3 data nodes involved, with 3 shards per index)

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/f34157df-b34e-4d69-a8bd-d8cffb2e5667%40goo
glegroups.com https://groups.google.com/d/msgid/elasticsearch/f34157df-b34e-4d69-a8bd-d8cffb2e5667%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/8bca4439-5c70-48b6-b5bd-45631e0a5fb2%
40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/8bca4439-5c70-48b6-b5bd-45631e0a5fb2%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/9e52d337-7b5d-411b-904d-477c0806f99d%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/9e52d337-7b5d-411b-904d-477c0806f99d%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624Zs2t_o-UH1%3DU0GEsD%3DbbihGGNaFpeC_RVztNC8sTK1Cg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Mark_Harwood_2 · May 13, 2014, 10:38am

This API should give an indication on any backlog in processing the cluster
state:

On Tuesday, May 13, 2014 11:29:20 AM UTC+1, Paul wrote:

Ok, do you know if there are clear indicators when limits are being
reached?

We don't see errors in the logs (apart from the 30s timeout) but if there
are system or ES provided metrics that we can track to know when we need to
scale it would be really useful.

Thanks,

Paul.

On Tuesday, May 13, 2014 11:24:06 AM UTC+1, Mark Walkom wrote:

Empty or not, there is still metadata that ES needs to maintain in the
cluster state. So the more indexes you have open the bigger that is and the
more resources required to track it.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 13 May 2014 20:16, Paul codi...@gmail.com wrote:

In testing and replicating the issue, this slow down has been seen
occurring with empty indices.

The running cluster is at present ~100 GB across 2,200 Indices with a
total of 13,500 shards and ~430,000,000 documents.

We have 7GB RAM and 5GB heap on the data nodes - haven't looked overly
carefully but don't think the heap is maxing out on any of the nodes when
this occurs.

Thanks,

Paul.

On Tuesday, May 13, 2014 11:02:32 AM UTC+1, Mark Walkom wrote:

Sounds like the inevitable "add more nodes" situation.

How much RAM on each node, how big is your data set?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 13 May 2014 19:59, Paul codi...@gmail.com wrote:

We are seeing a slow down in shard initialization speed as the number
of shards/indices grows in our cluster.

With 0-100's of indices/shards existing in the cluster a new bulk
creation of indices up the 100's at a time is fine, we see them pass
through the states and get a green cluster in a reasonable amount of time.

As the total cluster size grows to 1000+ indices (3000+ shards) we
begin to notice that the first rounds of initialization take longer to
process, it seems to speed up after the first few batches, but this slow
down leads to "failed to process cluster event (create-index
[index_1112], cause [auto(bulk api)]) within 30s" type messages in the
Master logs - the indices are eventually created.

Has anyone else experienced this? (did you find the cause / way to
fix?)

Is this somewhat expected behaviour? - are we approaching something
incorrectly? (there are 3 data nodes involved, with 3 shards per index)

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/f34157df-b34e-4d69-a8bd-d8cffb2e5667%
40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/f34157df-b34e-4d69-a8bd-d8cffb2e5667%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/8bca4439-5c70-48b6-b5bd-45631e0a5fb2%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/8bca4439-5c70-48b6-b5bd-45631e0a5fb2%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4916fa14-5fc4-4fd2-92a2-f300acb71623%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Paul_5 · May 13, 2014, 10:38am

Thanks Mark, we'll have a look at the available metrics.

On Tuesday, May 13, 2014 11:34:51 AM UTC+1, Mark Walkom wrote:

You will want to obtain Marvel (
Elasticsearch Platform — Find real-time answers at scale | Elastic) and then wait till
you have a history and start digging.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com <javascript:>
web: www.campaignmonitor.com

On 13 May 2014 20:29, Paul <codi...@gmail.com <javascript:>> wrote:

Ok, do you know if there are clear indicators when limits are being
reached?

We don't see errors in the logs (apart from the 30s timeout) but if there
are system or ES provided metrics that we can track to know when we need to
scale it would be really useful.

Thanks,

Paul.

On Tuesday, May 13, 2014 11:24:06 AM UTC+1, Mark Walkom wrote:

Empty or not, there is still metadata that ES needs to maintain in the
cluster state. So the more indexes you have open the bigger that is and the
more resources required to track it.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 13 May 2014 20:16, Paul codi...@gmail.com wrote:

In testing and replicating the issue, this slow down has been seen
occurring with empty indices.

The running cluster is at present ~100 GB across 2,200 Indices with a
total of 13,500 shards and ~430,000,000 documents.

We have 7GB RAM and 5GB heap on the data nodes - haven't looked overly
carefully but don't think the heap is maxing out on any of the nodes when
this occurs.

Thanks,

Paul.

On Tuesday, May 13, 2014 11:02:32 AM UTC+1, Mark Walkom wrote:

Sounds like the inevitable "add more nodes" situation.

How much RAM on each node, how big is your data set?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 13 May 2014 19:59, Paul codi...@gmail.com wrote:

We are seeing a slow down in shard initialization speed as the number
of shards/indices grows in our cluster.

With 0-100's of indices/shards existing in the cluster a new bulk
creation of indices up the 100's at a time is fine, we see them pass
through the states and get a green cluster in a reasonable amount of time.

As the total cluster size grows to 1000+ indices (3000+ shards) we
begin to notice that the first rounds of initialization take longer to
process, it seems to speed up after the first few batches, but this slow
down leads to "failed to process cluster event (create-index
[index_1112], cause [auto(bulk api)]) within 30s" type messages in the
Master logs - the indices are eventually created.

Has anyone else experienced this? (did you find the cause / way to
fix?)

Is this somewhat expected behaviour? - are we approaching something
incorrectly? (there are 3 data nodes involved, with 3 shards per index)

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/f34157df-b34e-4d69-a8bd-d8cffb2e5667%40goo
glegroups.com https://groups.google.com/d/msgid/elasticsearch/f34157df-b34e-4d69-a8bd-d8cffb2e5667%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/8bca4439-5c70-48b6-b5bd-45631e0a5fb2%
40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/8bca4439-5c70-48b6-b5bd-45631e0a5fb2%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/9e52d337-7b5d-411b-904d-477c0806f99d%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/9e52d337-7b5d-411b-904d-477c0806f99d%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1c0bd680-671c-4840-9bf3-2cabd37e585c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Paul_5 · May 13, 2014, 10:45am

This looks very interesting, thanks.

On Tuesday, May 13, 2014 11:38:27 AM UTC+1, Mark Harwood wrote:

This API should give an indication on any backlog in processing the
cluster state:
Elasticsearch Platform — Find real-time answers at scale | Elastic

On Tuesday, May 13, 2014 11:29:20 AM UTC+1, Paul wrote:

Ok, do you know if there are clear indicators when limits are being
reached?

We don't see errors in the logs (apart from the 30s timeout) but if there
are system or ES provided metrics that we can track to know when we need to
scale it would be really useful.

Thanks,

Paul.

On Tuesday, May 13, 2014 11:24:06 AM UTC+1, Mark Walkom wrote:

Empty or not, there is still metadata that ES needs to maintain in the
cluster state. So the more indexes you have open the bigger that is and the
more resources required to track it.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 13 May 2014 20:16, Paul codi...@gmail.com wrote:

In testing and replicating the issue, this slow down has been seen
occurring with empty indices.

The running cluster is at present ~100 GB across 2,200 Indices with a
total of 13,500 shards and ~430,000,000 documents.

We have 7GB RAM and 5GB heap on the data nodes - haven't looked overly
carefully but don't think the heap is maxing out on any of the nodes when
this occurs.

Thanks,

Paul.

On Tuesday, May 13, 2014 11:02:32 AM UTC+1, Mark Walkom wrote:

Sounds like the inevitable "add more nodes" situation.

How much RAM on each node, how big is your data set?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 13 May 2014 19:59, Paul codi...@gmail.com wrote:

We are seeing a slow down in shard initialization speed as the number
of shards/indices grows in our cluster.

With 0-100's of indices/shards existing in the cluster a new bulk
creation of indices up the 100's at a time is fine, we see them pass
through the states and get a green cluster in a reasonable amount of time.

As the total cluster size grows to 1000+ indices (3000+ shards) we
begin to notice that the first rounds of initialization take longer to
process, it seems to speed up after the first few batches, but this slow
down leads to "failed to process cluster event (create-index
[index_1112], cause [auto(bulk api)]) within 30s" type messages in the
Master logs - the indices are eventually created.

Has anyone else experienced this? (did you find the cause / way to
fix?)

Is this somewhat expected behaviour? - are we approaching something
incorrectly? (there are 3 data nodes involved, with 3 shards per index)

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/f34157df-b34e-4d69-a8bd-d8cffb2e5667%
40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/f34157df-b34e-4d69-a8bd-d8cffb2e5667%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/8bca4439-5c70-48b6-b5bd-45631e0a5fb2%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/8bca4439-5c70-48b6-b5bd-45631e0a5fb2%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/dbdfe1ea-7b1b-4e65-bcdb-251aceab1fe0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jprante · May 13, 2014, 11:13am

You should create indexes before bulk indexing. First, bulk indexing works
much better if all indices and their mappings are already present, the
operations will run faster and without conflicts, and the cluster state
updates are less frequent which reduces some noise and hiccups. Second,
setting the indices refresh rate to -1 and replica level to 0 while in bulk
indexing mode helps a lot for performance.

If you create 1000+ shards per node, you seem to exceed the limit of your
system. Do not expect admin operations like index creation work in O(1)
time, they are O(n/c) with n = number of affected shards and c the
threadpool size for the operation (the total node number also counts but I
neglect it here). So yes, it is expected that index creation operations
take longer if they reach the limit of your nodes, but there can be plenty
of reasons for it (increasing shard count is just one of them). And it is
expected that you see the 30s cluster action timeout in theses cases, yes.

There is no strictly predictable resource limit for a node, all this
depends heavily on factors from outside of Elasticsearch (JVM, CPU, memory,
disk I/O, your workload of indexing/searching) so it is up to you to
calibrate your node capacity. After adding nodes, you will observe that ES
scales well and can handle more shards.

Jörg

On Tue, May 13, 2014 at 11:59 AM, Paul codiverse@gmail.com wrote:

We are seeing a slow down in shard initialization speed as the number of
shards/indices grows in our cluster.

With 0-100's of indices/shards existing in the cluster a new bulk creation
of indices up the 100's at a time is fine, we see them pass through the
states and get a green cluster in a reasonable amount of time.

As the total cluster size grows to 1000+ indices (3000+ shards) we begin
to notice that the first rounds of initialization take longer to process,
it seems to speed up after the first few batches, but this slow down leads
to "failed to process cluster event (create-index [index_1112], cause
[auto(bulk api)]) within 30s" type messages in the Master logs - the
indices are eventually created.

Has anyone else experienced this? (did you find the cause / way to fix?)

Is this somewhat expected behaviour? - are we approaching something
incorrectly? (there are 3 data nodes involved, with 3 shards per index)

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/f34157df-b34e-4d69-a8bd-d8cffb2e5667%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/f34157df-b34e-4d69-a8bd-d8cffb2e5667%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHG8gXnPNje24sN7SzyskAYUrLEPpJpeZS9O5DZYgFdyA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Paul_5 · May 13, 2014, 12:47pm

Thanks Jörg, we've heard of others pre-creating indices, we were seeing it
as a work around rather than a regular action but what you say makes it
seem like something we should work with.

On Tuesday, May 13, 2014 12:13:10 PM UTC+1, Jörg Prante wrote:

You should create indexes before bulk indexing. First, bulk indexing works
much better if all indices and their mappings are already present, the
operations will run faster and without conflicts, and the cluster state
updates are less frequent which reduces some noise and hiccups. Second,
setting the indices refresh rate to -1 and replica level to 0 while in bulk
indexing mode helps a lot for performance.

If you create 1000+ shards per node, you seem to exceed the limit of your
system. Do not expect admin operations like index creation work in O(1)
time, they are O(n/c) with n = number of affected shards and c the
threadpool size for the operation (the total node number also counts but I
neglect it here). So yes, it is expected that index creation operations
take longer if they reach the limit of your nodes, but there can be plenty
of reasons for it (increasing shard count is just one of them). And it is
expected that you see the 30s cluster action timeout in theses cases, yes.

There is no strictly predictable resource limit for a node, all this
depends heavily on factors from outside of Elasticsearch (JVM, CPU, memory,
disk I/O, your workload of indexing/searching) so it is up to you to
calibrate your node capacity. After adding nodes, you will observe that ES
scales well and can handle more shards.

Jörg

On Tue, May 13, 2014 at 11:59 AM, Paul <codi...@gmail.com <javascript:>>wrote:

We are seeing a slow down in shard initialization speed as the number of
shards/indices grows in our cluster.

With 0-100's of indices/shards existing in the cluster a new bulk
creation of indices up the 100's at a time is fine, we see them pass
through the states and get a green cluster in a reasonable amount of time.

As the total cluster size grows to 1000+ indices (3000+ shards) we begin
to notice that the first rounds of initialization take longer to process,
it seems to speed up after the first few batches, but this slow down leads
to "failed to process cluster event (create-index [index_1112], cause
[auto(bulk api)]) within 30s" type messages in the Master logs - the
indices are eventually created.

Has anyone else experienced this? (did you find the cause / way to fix?)

Is this somewhat expected behaviour? - are we approaching something
incorrectly? (there are 3 data nodes involved, with 3 shards per index)

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/f34157df-b34e-4d69-a8bd-d8cffb2e5667%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/f34157df-b34e-4d69-a8bd-d8cffb2e5667%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6c918772-cd05-4640-aa67-3924737b3342%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Slow Shard Assignment Elasticsearch	6	1811	July 6, 2017
Upper limits on indexes/shards in a cluster Elasticsearch	11	1205	July 6, 2017
Increasing shards and then nodes Elasticsearch	12	923	July 6, 2017
Slow Query Performance Elasticsearch	10	798	July 6, 2017
ElasticSearch with > 40 nodes, missing shards and indexing troubles Elasticsearch	11	659	July 6, 2017

Shard Initialization slow down

Related topics