Index Creation Failure


(nicktgr15) #1

Hello,

We are having an issue with elasticsearch index creation. In our cluster we
create new indices everyday at midnight and at the moment we create about
150 new indices every time.
Lately we have started getting log lines like the following during index
creation:

[2014-05-05 00:00:35,596][DEBUG][action.admin.indices.create] [Amber Hunt] [
indexname-2014-05-05] failed to create
org.elasticsearch.cluster.metadata.ProcessClusterEventTimeoutException:failed to process cluster
event (create-index [indexname-2014-05-05], cause [auto(bulk api)]) within
30s
at org.elasticsearch.cluster.service.InternalClusterService$2$1.run(
InternalClusterService.java:248)
at java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)

It's not very clear what this error means and how we could stop seeing this
in our logs. Is there a config parameter for the timeout value of bulk
requests? (if that's the problem)

Our cluster at the moment has the following stats and we are using
elasticsearch 1.1.1:

12 Nodes (3 master, 6 data, 3 search)
13,140 Total Shards
13,140 Successful Shards
2,196 Indices
434,248,844 Documents
194.8GB Size

We have noticed that around the same time we see the above "failed to
create" message, flume elasticsearh-sink (used on the client side) stops
working, so we are trying to understand if there is any correlation between
these two events (index creation failure, flume elasticsearch-sink failure).

Any help/suggestions would be appreciated!

Regards,
Nick

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/79eaa859-5583-40ea-8af7-9f5e5878d298%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Mark Walkom) #2

That's a massive amount of indexes to create at the one time, and to have
on your cluster, so it's no surprise it's timing out.

How big are your nodes? Can you add a few more or collapse your index count?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 6 May 2014 19:48, nicktgr15 nicktgr15@gmail.com wrote:

Hello,

We are having an issue with elasticsearch index creation. In our cluster
we create new indices everyday at midnight and at the moment we create
about 150 new indices every time.
Lately we have started getting log lines like the following during index
creation:

[2014-05-05 00:00:35,596][DEBUG][action.admin.indices.create] [Amber Hunt]
[indexname-2014-05-05] failed to create
org.elasticsearch.cluster.metadata.ProcessClusterEventTimeoutException:failed to process cluster
event (create-index [indexname-2014-05-05], cause [auto(bulk api)])within
30s
at org.elasticsearch.cluster.service.InternalClusterService$2$1.
run(InternalClusterService.java:248)
at java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)

It's not very clear what this error means and how we could stop seeing
this in our logs. Is there a config parameter for the timeout value of bulk
requests? (if that's the problem)

Our cluster at the moment has the following stats and we are using
elasticsearch 1.1.1:

12 Nodes (3 master, 6 data, 3 search)
13,140 Total Shards
13,140 Successful Shards
2,196 Indices
434,248,844 Documents
194.8GB Size

We have noticed that around the same time we see the above "failed to
create" message, flume elasticsearh-sink (used on the client side) stops
working, so we are trying to understand if there is any correlation between
these two events (index creation failure, flume elasticsearch-sink failure).

Any help/suggestions would be appreciated!

Regards,
Nick

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/79eaa859-5583-40ea-8af7-9f5e5878d298%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/79eaa859-5583-40ea-8af7-9f5e5878d298%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624YY3OcGuWQFw38FXd%2BkuXyPU6ecPAmnew8z1h8NagQ-Jw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(nicktgr15) #3

Thanks for the response Mark. Our 6 data nodes are m1.large EC2 instances. https://aws.amazon.com/ec2/previous-generation/

We could probably add more data-nodes and see if there is any improvement,
however it's interesting that we create this high number of new indices
only at midnight.
Generally, from the performance/load stats we've seen so far I don't think
we need more data-nodes during the day. It feels like increasing the number
of data-nodes just to address this issue would be a waste of resources.

I'm also adding some log lines related to the error messages we see on the
flume elasticsearch-sink side.

05 May 2014 00:00:12,850 INFO [elasticsearch[Aldebron][generic][T#1]]
(org.elasticsearch.common.logging.log4j.Log4jESLogger.internalInfo:119) -
[Aldebron] failed to get node info for
[#transport#-1][ip-10-0-235-53.eu-west-1.compute.internal][inet[/10.0.238.47:9300]],
disconnecting...
org.elasticsearch.transport.ReceiveTimeoutTransportException: [][inet[/10.0.
238.47:9300]][cluster/nodes/info] request_id [107512] timed out after [
5000ms]
at org.elasticsearch.transport.TransportService$TimeoutHandler.run(
TransportService.java:356)
at java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)

On Tuesday, May 6, 2014 11:07:59 AM UTC+1, Mark Walkom wrote:

That's a massive amount of indexes to create at the one time, and to have
on your cluster, so it's no surprise it's timing out.

How big are your nodes? Can you add a few more or collapse your index
count?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com <javascript:>
web: www.campaignmonitor.com

On 6 May 2014 19:48, nicktgr15 <nick...@gmail.com <javascript:>> wrote:

Hello,

We are having an issue with elasticsearch index creation. In our cluster
we create new indices everyday at midnight and at the moment we create
about 150 new indices every time.
Lately we have started getting log lines like the following during index
creation:

[2014-05-05 00:00:35,596][DEBUG][action.admin.indices.create] [Amber Hunt
] [indexname-2014-05-05] failed to create
org.elasticsearch.cluster.metadata.ProcessClusterEventTimeoutException:failed to process cluster
event (create-index [indexname-2014-05-05], cause [auto(bulk api)])within
30s
at org.elasticsearch.cluster.service.InternalClusterService$2$1.
run(InternalClusterService.java:248)
at java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)

It's not very clear what this error means and how we could stop seeing
this in our logs. Is there a config parameter for the timeout value of bulk
requests? (if that's the problem)

Our cluster at the moment has the following stats and we are using
elasticsearch 1.1.1:

12 Nodes (3 master, 6 data, 3 search)
13,140 Total Shards
13,140 Successful Shards
2,196 Indices
434,248,844 Documents
194.8GB Size

We have noticed that around the same time we see the above "failed to
create" message, flume elasticsearh-sink (used on the client side) stops
working, so we are trying to understand if there is any correlation between
these two events (index creation failure, flume elasticsearch-sink failure).

Any help/suggestions would be appreciated!

Regards,
Nick

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/79eaa859-5583-40ea-8af7-9f5e5878d298%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/79eaa859-5583-40ea-8af7-9f5e5878d298%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/371e7a27-a045-4ccf-a7f5-7e9eb766db8d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #4