Shard Initialization slow down


(Paul-5) #1

We are seeing a slow down in shard initialization speed as the number of
shards/indices grows in our cluster.

With 0-100's of indices/shards existing in the cluster a new bulk creation
of indices up the 100's at a time is fine, we see them pass through the
states and get a green cluster in a reasonable amount of time.

As the total cluster size grows to 1000+ indices (3000+ shards) we begin to
notice that the first rounds of initialization take longer to process, it
seems to speed up after the first few batches, but this slow down leads to "failed
to process cluster event (create-index [index_1112], cause [auto(bulk
api)]) within 30s" type messages in the Master logs - the indices are
eventually created.

Has anyone else experienced this? (did you find the cause / way to fix?)

Is this somewhat expected behaviour? - are we approaching something
incorrectly? (there are 3 data nodes involved, with 3 shards per index)

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f34157df-b34e-4d69-a8bd-d8cffb2e5667%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Mark Walkom) #2

Sounds like the inevitable "add more nodes" situation.

How much RAM on each node, how big is your data set?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 13 May 2014 19:59, Paul codiverse@gmail.com wrote:

We are seeing a slow down in shard initialization speed as the number of
shards/indices grows in our cluster.

With 0-100's of indices/shards existing in the cluster a new bulk creation
of indices up the 100's at a time is fine, we see them pass through the
states and get a green cluster in a reasonable amount of time.

As the total cluster size grows to 1000+ indices (3000+ shards) we begin
to notice that the first rounds of initialization take longer to process,
it seems to speed up after the first few batches, but this slow down leads
to "failed to process cluster event (create-index [index_1112], cause
[auto(bulk api)]) within 30s" type messages in the Master logs - the
indices are eventually created.

Has anyone else experienced this? (did you find the cause / way to fix?)

Is this somewhat expected behaviour? - are we approaching something
incorrectly? (there are 3 data nodes involved, with 3 shards per index)

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/f34157df-b34e-4d69-a8bd-d8cffb2e5667%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/f34157df-b34e-4d69-a8bd-d8cffb2e5667%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624bazPsFZH5BtX8L1HCKEiP9_jz9_YLKOPhG%2BcES6DhQSQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Paul-5) #3

In testing and replicating the issue, this slow down has been seen
occurring with empty indices.

The running cluster is at present ~100 GB across 2,200 Indices with a total
of 13,500 shards and ~430,000,000 documents.

We have 7GB RAM and 5GB heap on the data nodes - haven't looked overly
carefully but don't think the heap is maxing out on any of the nodes when
this occurs.

Thanks,

Paul.

On Tuesday, May 13, 2014 11:02:32 AM UTC+1, Mark Walkom wrote:

Sounds like the inevitable "add more nodes" situation.

How much RAM on each node, how big is your data set?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com <javascript:>
web: www.campaignmonitor.com

On 13 May 2014 19:59, Paul <codi...@gmail.com <javascript:>> wrote:

We are seeing a slow down in shard initialization speed as the number of
shards/indices grows in our cluster.

With 0-100's of indices/shards existing in the cluster a new bulk
creation of indices up the 100's at a time is fine, we see them pass
through the states and get a green cluster in a reasonable amount of time.

As the total cluster size grows to 1000+ indices (3000+ shards) we begin
to notice that the first rounds of initialization take longer to process,
it seems to speed up after the first few batches, but this slow down leads
to "failed to process cluster event (create-index [index_1112], cause
[auto(bulk api)]) within 30s" type messages in the Master logs - the
indices are eventually created.

Has anyone else experienced this? (did you find the cause / way to fix?)

Is this somewhat expected behaviour? - are we approaching something
incorrectly? (there are 3 data nodes involved, with 3 shards per index)

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/f34157df-b34e-4d69-a8bd-d8cffb2e5667%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/f34157df-b34e-4d69-a8bd-d8cffb2e5667%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8bca4439-5c70-48b6-b5bd-45631e0a5fb2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Mark Walkom) #4

Empty or not, there is still metadata that ES needs to maintain in the
cluster state. So the more indexes you have open the bigger that is and the
more resources required to track it.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 13 May 2014 20:16, Paul codiverse@gmail.com wrote:

In testing and replicating the issue, this slow down has been seen
occurring with empty indices.

The running cluster is at present ~100 GB across 2,200 Indices with a
total of 13,500 shards and ~430,000,000 documents.

We have 7GB RAM and 5GB heap on the data nodes - haven't looked overly
carefully but don't think the heap is maxing out on any of the nodes when
this occurs.

Thanks,

Paul.

On Tuesday, May 13, 2014 11:02:32 AM UTC+1, Mark Walkom wrote:

Sounds like the inevitable "add more nodes" situation.

How much RAM on each node, how big is your data set?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 13 May 2014 19:59, Paul codi...@gmail.com wrote:

We are seeing a slow down in shard initialization speed as the number of
shards/indices grows in our cluster.

With 0-100's of indices/shards existing in the cluster a new bulk
creation of indices up the 100's at a time is fine, we see them pass
through the states and get a green cluster in a reasonable amount of time.

As the total cluster size grows to 1000+ indices (3000+ shards) we begin
to notice that the first rounds of initialization take longer to process,
it seems to speed up after the first few batches, but this slow down leads
to "failed to process cluster event (create-index [index_1112], cause
[auto(bulk api)]) within 30s" type messages in the Master logs - the
indices are eventually created.

Has anyone else experienced this? (did you find the cause / way to fix?)

Is this somewhat expected behaviour? - are we approaching something
incorrectly? (there are 3 data nodes involved, with 3 shards per index)

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/f34157df-b34e-4d69-a8bd-d8cffb2e5667%
40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/f34157df-b34e-4d69-a8bd-d8cffb2e5667%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/8bca4439-5c70-48b6-b5bd-45631e0a5fb2%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/8bca4439-5c70-48b6-b5bd-45631e0a5fb2%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624beqCRUW%2BRL2%3DKE%2BqAKtnpPa4ADOj7zoTWQSO0HE8W%3Dfg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Paul-5) #5

Ok, do you know if there are clear indicators when limits are being reached?

We don't see errors in the logs (apart from the 30s timeout) but if there
are system or ES provided metrics that we can track to know when we need to
scale it would be really useful.

Thanks,

Paul.

On Tuesday, May 13, 2014 11:24:06 AM UTC+1, Mark Walkom wrote:

Empty or not, there is still metadata that ES needs to maintain in the
cluster state. So the more indexes you have open the bigger that is and the
more resources required to track it.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com <javascript:>
web: www.campaignmonitor.com

On 13 May 2014 20:16, Paul <codi...@gmail.com <javascript:>> wrote:

In testing and replicating the issue, this slow down has been seen
occurring with empty indices.

The running cluster is at present ~100 GB across 2,200 Indices with a
total of 13,500 shards and ~430,000,000 documents.

We have 7GB RAM and 5GB heap on the data nodes - haven't looked overly
carefully but don't think the heap is maxing out on any of the nodes when
this occurs.

Thanks,

Paul.

On Tuesday, May 13, 2014 11:02:32 AM UTC+1, Mark Walkom wrote:

Sounds like the inevitable "add more nodes" situation.

How much RAM on each node, how big is your data set?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 13 May 2014 19:59, Paul codi...@gmail.com wrote:

We are seeing a slow down in shard initialization speed as the number
of shards/indices grows in our cluster.

With 0-100's of indices/shards existing in the cluster a new bulk
creation of indices up the 100's at a time is fine, we see them pass
through the states and get a green cluster in a reasonable amount of time.

As the total cluster size grows to 1000+ indices (3000+ shards) we
begin to notice that the first rounds of initialization take longer to
process, it seems to speed up after the first few batches, but this slow
down leads to "failed to process cluster event (create-index
[index_1112], cause [auto(bulk api)]) within 30s" type messages in the
Master logs - the indices are eventually created.

Has anyone else experienced this? (did you find the cause / way to fix?)

Is this somewhat expected behaviour? - are we approaching something
incorrectly? (there are 3 data nodes involved, with 3 shards per index)

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/f34157df-b34e-4d69-a8bd-d8cffb2e5667%
40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/f34157df-b34e-4d69-a8bd-d8cffb2e5667%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/8bca4439-5c70-48b6-b5bd-45631e0a5fb2%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/8bca4439-5c70-48b6-b5bd-45631e0a5fb2%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9e52d337-7b5d-411b-904d-477c0806f99d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Mark Walkom) #6

You will want to obtain Marvel (
http://www.elasticsearch.org/guide/en/marvel/current/) and then wait till
you have a history and start digging.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 13 May 2014 20:29, Paul codiverse@gmail.com wrote:

Ok, do you know if there are clear indicators when limits are being
reached?

We don't see errors in the logs (apart from the 30s timeout) but if there
are system or ES provided metrics that we can track to know when we need to
scale it would be really useful.

Thanks,

Paul.

On Tuesday, May 13, 2014 11:24:06 AM UTC+1, Mark Walkom wrote:

Empty or not, there is still metadata that ES needs to maintain in the
cluster state. So the more indexes you have open the bigger that is and the
more resources required to track it.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 13 May 2014 20:16, Paul codi...@gmail.com wrote:

In testing and replicating the issue, this slow down has been seen
occurring with empty indices.

The running cluster is at present ~100 GB across 2,200 Indices with a
total of 13,500 shards and ~430,000,000 documents.

We have 7GB RAM and 5GB heap on the data nodes - haven't looked overly
carefully but don't think the heap is maxing out on any of the nodes when
this occurs.

Thanks,

Paul.

On Tuesday, May 13, 2014 11:02:32 AM UTC+1, Mark Walkom wrote:

Sounds like the inevitable "add more nodes" situation.

How much RAM on each node, how big is your data set?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 13 May 2014 19:59, Paul codi...@gmail.com wrote:

We are seeing a slow down in shard initialization speed as the number
of shards/indices grows in our cluster.

With 0-100's of indices/shards existing in the cluster a new bulk
creation of indices up the 100's at a time is fine, we see them pass
through the states and get a green cluster in a reasonable amount of time.

As the total cluster size grows to 1000+ indices (3000+ shards) we
begin to notice that the first rounds of initialization take longer to
process, it seems to speed up after the first few batches, but this slow
down leads to "failed to process cluster event (create-index
[index_1112], cause [auto(bulk api)]) within 30s" type messages in the
Master logs - the indices are eventually created.

Has anyone else experienced this? (did you find the cause / way to
fix?)

Is this somewhat expected behaviour? - are we approaching something
incorrectly? (there are 3 data nodes involved, with 3 shards per index)

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/f34157df-b34e-4d69-a8bd-d8cffb2e5667%40goo
glegroups.comhttps://groups.google.com/d/msgid/elasticsearch/f34157df-b34e-4d69-a8bd-d8cffb2e5667%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/8bca4439-5c70-48b6-b5bd-45631e0a5fb2%
40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/8bca4439-5c70-48b6-b5bd-45631e0a5fb2%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/9e52d337-7b5d-411b-904d-477c0806f99d%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/9e52d337-7b5d-411b-904d-477c0806f99d%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624Zs2t_o-UH1%3DU0GEsD%3DbbihGGNaFpeC_RVztNC8sTK1Cg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Mark Harwood-2) #7

This API should give an indication on any backlog in processing the cluster
state:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/cluster-pending.html

On Tuesday, May 13, 2014 11:29:20 AM UTC+1, Paul wrote:

Ok, do you know if there are clear indicators when limits are being
reached?

We don't see errors in the logs (apart from the 30s timeout) but if there
are system or ES provided metrics that we can track to know when we need to
scale it would be really useful.

Thanks,

Paul.

On Tuesday, May 13, 2014 11:24:06 AM UTC+1, Mark Walkom wrote:

Empty or not, there is still metadata that ES needs to maintain in the
cluster state. So the more indexes you have open the bigger that is and the
more resources required to track it.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 13 May 2014 20:16, Paul codi...@gmail.com wrote:

In testing and replicating the issue, this slow down has been seen
occurring with empty indices.

The running cluster is at present ~100 GB across 2,200 Indices with a
total of 13,500 shards and ~430,000,000 documents.

We have 7GB RAM and 5GB heap on the data nodes - haven't looked overly
carefully but don't think the heap is maxing out on any of the nodes when
this occurs.

Thanks,

Paul.

On Tuesday, May 13, 2014 11:02:32 AM UTC+1, Mark Walkom wrote:

Sounds like the inevitable "add more nodes" situation.

How much RAM on each node, how big is your data set?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 13 May 2014 19:59, Paul codi...@gmail.com wrote:

We are seeing a slow down in shard initialization speed as the number
of shards/indices grows in our cluster.

With 0-100's of indices/shards existing in the cluster a new bulk
creation of indices up the 100's at a time is fine, we see them pass
through the states and get a green cluster in a reasonable amount of time.

As the total cluster size grows to 1000+ indices (3000+ shards) we
begin to notice that the first rounds of initialization take longer to
process, it seems to speed up after the first few batches, but this slow
down leads to "failed to process cluster event (create-index
[index_1112], cause [auto(bulk api)]) within 30s" type messages in the
Master logs - the indices are eventually created.

Has anyone else experienced this? (did you find the cause / way to
fix?)

Is this somewhat expected behaviour? - are we approaching something
incorrectly? (there are 3 data nodes involved, with 3 shards per index)

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/f34157df-b34e-4d69-a8bd-d8cffb2e5667%
40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/f34157df-b34e-4d69-a8bd-d8cffb2e5667%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/8bca4439-5c70-48b6-b5bd-45631e0a5fb2%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/8bca4439-5c70-48b6-b5bd-45631e0a5fb2%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4916fa14-5fc4-4fd2-92a2-f300acb71623%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Paul-5) #8

Thanks Mark, we'll have a look at the available metrics.

On Tuesday, May 13, 2014 11:34:51 AM UTC+1, Mark Walkom wrote:

You will want to obtain Marvel (
http://www.elasticsearch.org/guide/en/marvel/current/) and then wait till
you have a history and start digging.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com <javascript:>
web: www.campaignmonitor.com

On 13 May 2014 20:29, Paul <codi...@gmail.com <javascript:>> wrote:

Ok, do you know if there are clear indicators when limits are being
reached?

We don't see errors in the logs (apart from the 30s timeout) but if there
are system or ES provided metrics that we can track to know when we need to
scale it would be really useful.

Thanks,

Paul.

On Tuesday, May 13, 2014 11:24:06 AM UTC+1, Mark Walkom wrote:

Empty or not, there is still metadata that ES needs to maintain in the
cluster state. So the more indexes you have open the bigger that is and the
more resources required to track it.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 13 May 2014 20:16, Paul codi...@gmail.com wrote:

In testing and replicating the issue, this slow down has been seen
occurring with empty indices.

The running cluster is at present ~100 GB across 2,200 Indices with a
total of 13,500 shards and ~430,000,000 documents.

We have 7GB RAM and 5GB heap on the data nodes - haven't looked overly
carefully but don't think the heap is maxing out on any of the nodes when
this occurs.

Thanks,

Paul.

On Tuesday, May 13, 2014 11:02:32 AM UTC+1, Mark Walkom wrote:

Sounds like the inevitable "add more nodes" situation.

How much RAM on each node, how big is your data set?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 13 May 2014 19:59, Paul codi...@gmail.com wrote:

We are seeing a slow down in shard initialization speed as the number
of shards/indices grows in our cluster.

With 0-100's of indices/shards existing in the cluster a new bulk
creation of indices up the 100's at a time is fine, we see them pass
through the states and get a green cluster in a reasonable amount of time.

As the total cluster size grows to 1000+ indices (3000+ shards) we
begin to notice that the first rounds of initialization take longer to
process, it seems to speed up after the first few batches, but this slow
down leads to "failed to process cluster event (create-index
[index_1112], cause [auto(bulk api)]) within 30s" type messages in the
Master logs - the indices are eventually created.

Has anyone else experienced this? (did you find the cause / way to
fix?)

Is this somewhat expected behaviour? - are we approaching something
incorrectly? (there are 3 data nodes involved, with 3 shards per index)

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/f34157df-b34e-4d69-a8bd-d8cffb2e5667%40goo
glegroups.comhttps://groups.google.com/d/msgid/elasticsearch/f34157df-b34e-4d69-a8bd-d8cffb2e5667%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/8bca4439-5c70-48b6-b5bd-45631e0a5fb2%
40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/8bca4439-5c70-48b6-b5bd-45631e0a5fb2%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/9e52d337-7b5d-411b-904d-477c0806f99d%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/9e52d337-7b5d-411b-904d-477c0806f99d%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1c0bd680-671c-4840-9bf3-2cabd37e585c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Paul-5) #9

This looks very interesting, thanks.

On Tuesday, May 13, 2014 11:38:27 AM UTC+1, Mark Harwood wrote:

This API should give an indication on any backlog in processing the
cluster state:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/cluster-pending.html

On Tuesday, May 13, 2014 11:29:20 AM UTC+1, Paul wrote:

Ok, do you know if there are clear indicators when limits are being
reached?

We don't see errors in the logs (apart from the 30s timeout) but if there
are system or ES provided metrics that we can track to know when we need to
scale it would be really useful.

Thanks,

Paul.

On Tuesday, May 13, 2014 11:24:06 AM UTC+1, Mark Walkom wrote:

Empty or not, there is still metadata that ES needs to maintain in the
cluster state. So the more indexes you have open the bigger that is and the
more resources required to track it.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 13 May 2014 20:16, Paul codi...@gmail.com wrote:

In testing and replicating the issue, this slow down has been seen
occurring with empty indices.

The running cluster is at present ~100 GB across 2,200 Indices with a
total of 13,500 shards and ~430,000,000 documents.

We have 7GB RAM and 5GB heap on the data nodes - haven't looked overly
carefully but don't think the heap is maxing out on any of the nodes when
this occurs.

Thanks,

Paul.

On Tuesday, May 13, 2014 11:02:32 AM UTC+1, Mark Walkom wrote:

Sounds like the inevitable "add more nodes" situation.

How much RAM on each node, how big is your data set?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 13 May 2014 19:59, Paul codi...@gmail.com wrote:

We are seeing a slow down in shard initialization speed as the number
of shards/indices grows in our cluster.

With 0-100's of indices/shards existing in the cluster a new bulk
creation of indices up the 100's at a time is fine, we see them pass
through the states and get a green cluster in a reasonable amount of time.

As the total cluster size grows to 1000+ indices (3000+ shards) we
begin to notice that the first rounds of initialization take longer to
process, it seems to speed up after the first few batches, but this slow
down leads to "failed to process cluster event (create-index
[index_1112], cause [auto(bulk api)]) within 30s" type messages in the
Master logs - the indices are eventually created.

Has anyone else experienced this? (did you find the cause / way to
fix?)

Is this somewhat expected behaviour? - are we approaching something
incorrectly? (there are 3 data nodes involved, with 3 shards per index)

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/f34157df-b34e-4d69-a8bd-d8cffb2e5667%
40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/f34157df-b34e-4d69-a8bd-d8cffb2e5667%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/8bca4439-5c70-48b6-b5bd-45631e0a5fb2%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/8bca4439-5c70-48b6-b5bd-45631e0a5fb2%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/dbdfe1ea-7b1b-4e65-bcdb-251aceab1fe0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Jörg Prante) #10

You should create indexes before bulk indexing. First, bulk indexing works
much better if all indices and their mappings are already present, the
operations will run faster and without conflicts, and the cluster state
updates are less frequent which reduces some noise and hiccups. Second,
setting the indices refresh rate to -1 and replica level to 0 while in bulk
indexing mode helps a lot for performance.

If you create 1000+ shards per node, you seem to exceed the limit of your
system. Do not expect admin operations like index creation work in O(1)
time, they are O(n/c) with n = number of affected shards and c the
threadpool size for the operation (the total node number also counts but I
neglect it here). So yes, it is expected that index creation operations
take longer if they reach the limit of your nodes, but there can be plenty
of reasons for it (increasing shard count is just one of them). And it is
expected that you see the 30s cluster action timeout in theses cases, yes.

There is no strictly predictable resource limit for a node, all this
depends heavily on factors from outside of Elasticsearch (JVM, CPU, memory,
disk I/O, your workload of indexing/searching) so it is up to you to
calibrate your node capacity. After adding nodes, you will observe that ES
scales well and can handle more shards.

Jörg

On Tue, May 13, 2014 at 11:59 AM, Paul codiverse@gmail.com wrote:

We are seeing a slow down in shard initialization speed as the number of
shards/indices grows in our cluster.

With 0-100's of indices/shards existing in the cluster a new bulk creation
of indices up the 100's at a time is fine, we see them pass through the
states and get a green cluster in a reasonable amount of time.

As the total cluster size grows to 1000+ indices (3000+ shards) we begin
to notice that the first rounds of initialization take longer to process,
it seems to speed up after the first few batches, but this slow down leads
to "failed to process cluster event (create-index [index_1112], cause
[auto(bulk api)]) within 30s" type messages in the Master logs - the
indices are eventually created.

Has anyone else experienced this? (did you find the cause / way to fix?)

Is this somewhat expected behaviour? - are we approaching something
incorrectly? (there are 3 data nodes involved, with 3 shards per index)

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/f34157df-b34e-4d69-a8bd-d8cffb2e5667%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/f34157df-b34e-4d69-a8bd-d8cffb2e5667%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHG8gXnPNje24sN7SzyskAYUrLEPpJpeZS9O5DZYgFdyA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Paul-5) #11

Thanks Jörg, we've heard of others pre-creating indices, we were seeing it
as a work around rather than a regular action but what you say makes it
seem like something we should work with.

On Tuesday, May 13, 2014 12:13:10 PM UTC+1, Jörg Prante wrote:

You should create indexes before bulk indexing. First, bulk indexing works
much better if all indices and their mappings are already present, the
operations will run faster and without conflicts, and the cluster state
updates are less frequent which reduces some noise and hiccups. Second,
setting the indices refresh rate to -1 and replica level to 0 while in bulk
indexing mode helps a lot for performance.

If you create 1000+ shards per node, you seem to exceed the limit of your
system. Do not expect admin operations like index creation work in O(1)
time, they are O(n/c) with n = number of affected shards and c the
threadpool size for the operation (the total node number also counts but I
neglect it here). So yes, it is expected that index creation operations
take longer if they reach the limit of your nodes, but there can be plenty
of reasons for it (increasing shard count is just one of them). And it is
expected that you see the 30s cluster action timeout in theses cases, yes.

There is no strictly predictable resource limit for a node, all this
depends heavily on factors from outside of Elasticsearch (JVM, CPU, memory,
disk I/O, your workload of indexing/searching) so it is up to you to
calibrate your node capacity. After adding nodes, you will observe that ES
scales well and can handle more shards.

Jörg

On Tue, May 13, 2014 at 11:59 AM, Paul <codi...@gmail.com <javascript:>>wrote:

We are seeing a slow down in shard initialization speed as the number of
shards/indices grows in our cluster.

With 0-100's of indices/shards existing in the cluster a new bulk
creation of indices up the 100's at a time is fine, we see them pass
through the states and get a green cluster in a reasonable amount of time.

As the total cluster size grows to 1000+ indices (3000+ shards) we begin
to notice that the first rounds of initialization take longer to process,
it seems to speed up after the first few batches, but this slow down leads
to "failed to process cluster event (create-index [index_1112], cause
[auto(bulk api)]) within 30s" type messages in the Master logs - the
indices are eventually created.

Has anyone else experienced this? (did you find the cause / way to fix?)

Is this somewhat expected behaviour? - are we approaching something
incorrectly? (there are 3 data nodes involved, with 3 shards per index)

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/f34157df-b34e-4d69-a8bd-d8cffb2e5667%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/f34157df-b34e-4d69-a8bd-d8cffb2e5667%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6c918772-cd05-4640-aa67-3924737b3342%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #12