We are having an issue with elasticsearch index creation. In our cluster we
create new indices everyday at midnight and at the moment we create about
150 new indices every time.
Lately we have started getting log lines like the following during index
creation:
[2014-05-05 00:00:35,596][DEBUG][action.admin.indices.create] [Amber Hunt] [
indexname-2014-05-05] failed to create
org.elasticsearch.cluster.metadata.ProcessClusterEventTimeoutException:failed to process cluster
event (create-index [indexname-2014-05-05], cause [auto(bulk api)]) within
30s
at org.elasticsearch.cluster.service.InternalClusterService$2$1.run(
InternalClusterService.java:248)
at java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
It's not very clear what this error means and how we could stop seeing this
in our logs. Is there a config parameter for the timeout value of bulk
requests? (if that's the problem)
Our cluster at the moment has the following stats and we are using
elasticsearch 1.1.1:
12 Nodes (3 master, 6 data, 3 search)
13,140 Total Shards
13,140 Successful Shards
2,196 Indices
434,248,844 Documents
194.8GB Size
We have noticed that around the same time we see the above "failed to
create" message, flume elasticsearh-sink (used on the client side) stops
working, so we are trying to understand if there is any correlation between
these two events (index creation failure, flume elasticsearch-sink failure).
We are having an issue with elasticsearch index creation. In our cluster
we create new indices everyday at midnight and at the moment we create
about 150 new indices every time.
Lately we have started getting log lines like the following during index
creation:
[2014-05-05 00:00:35,596][DEBUG][action.admin.indices.create] [Amber Hunt]
[indexname-2014-05-05] failed to create
org.elasticsearch.cluster.metadata.ProcessClusterEventTimeoutException:failed to process cluster
event (create-index [indexname-2014-05-05], cause [auto(bulk api)])within
30s
at org.elasticsearch.cluster.service.InternalClusterService$2$1.
run(InternalClusterService.java:248)
at java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
It's not very clear what this error means and how we could stop seeing
this in our logs. Is there a config parameter for the timeout value of bulk
requests? (if that's the problem)
Our cluster at the moment has the following stats and we are using
elasticsearch 1.1.1:
12 Nodes (3 master, 6 data, 3 search)
13,140 Total Shards
13,140 Successful Shards
2,196 Indices
434,248,844 Documents
194.8GB Size
We have noticed that around the same time we see the above "failed to
create" message, flume elasticsearh-sink (used on the client side) stops
working, so we are trying to understand if there is any correlation between
these two events (index creation failure, flume elasticsearch-sink failure).
We could probably add more data-nodes and see if there is any improvement,
however it's interesting that we create this high number of new indices
only at midnight.
Generally, from the performance/load stats we've seen so far I don't think
we need more data-nodes during the day. It feels like increasing the number
of data-nodes just to address this issue would be a waste of resources.
I'm also adding some log lines related to the error messages we see on the
flume elasticsearch-sink side.
05 May 2014 00:00:12,850 INFO [elasticsearch[Aldebron][generic][T#1]]
(org.elasticsearch.common.logging.log4j.Log4jESLogger.internalInfo:119) -
[Aldebron] failed to get node info for
[#transport#-1][ip-10-0-235-53.eu-west-1.compute.internal][inet[/10.0.238.47:9300]],
disconnecting...
org.elasticsearch.transport.ReceiveTimeoutTransportException: [inet[/10.0.
238.47:9300]][cluster/nodes/info] request_id [107512] timed out after [
5000ms]
at org.elasticsearch.transport.TransportService$TimeoutHandler.run(
TransportService.java:356)
at java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
On Tuesday, May 6, 2014 11:07:59 AM UTC+1, Mark Walkom wrote:
That's a massive amount of indexes to create at the one time, and to have
on your cluster, so it's no surprise it's timing out.
How big are your nodes? Can you add a few more or collapse your index
count?
On 6 May 2014 19:48, nicktgr15 <nick...@gmail.com <javascript:>> wrote:
Hello,
We are having an issue with elasticsearch index creation. In our cluster
we create new indices everyday at midnight and at the moment we create
about 150 new indices every time.
Lately we have started getting log lines like the following during index
creation:
[2014-05-05 00:00:35,596][DEBUG][action.admin.indices.create] [Amber Hunt
] [indexname-2014-05-05] failed to create
org.elasticsearch.cluster.metadata.ProcessClusterEventTimeoutException:failed to process cluster
event (create-index [indexname-2014-05-05], cause [auto(bulk api)])within
30s
at org.elasticsearch.cluster.service.InternalClusterService$2$1.
run(InternalClusterService.java:248)
at java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
It's not very clear what this error means and how we could stop seeing
this in our logs. Is there a config parameter for the timeout value of bulk
requests? (if that's the problem)
Our cluster at the moment has the following stats and we are using
elasticsearch 1.1.1:
12 Nodes (3 master, 6 data, 3 search)
13,140 Total Shards
13,140 Successful Shards
2,196 Indices
434,248,844 Documents
194.8GB Size
We have noticed that around the same time we see the above "failed to
create" message, flume elasticsearh-sink (used on the client side) stops
working, so we are trying to understand if there is any correlation between
these two events (index creation failure, flume elasticsearch-sink failure).
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.