Hi everybody,
I have a cluster of elasticsearch in which i have 3 data node. I have
around 2 millions (1.5 GB) of documents.
cluster is of EC2 instances and each node have 6 GB RAM committed for
elasticsearch.
I am using S3 as index gateway.
It was working fine from last 28 days, and suddenly i am getting exception
and all the data nodes log files are flooded with the exception message at
end of this mail.
what i have understand that,
-
Indices/shards in S3 bucket are corrupted, ( because if i want to create
a new elasticsearch data node and it does not able to recover from S3 hence
the same error message. -
Is there anyway, i could recover the indices in S3 ?
-
In my hard drive, i have the indices and how could i push them in S3. so
that my new elasticsearch date node recover the indices from S3. -
What is the possible reason that the indices in S3 got corrupted, so
that i could prevent it in future. (becaus my assumption was that, though
there is performance hit in having remote gate like S3 instead of local, i
choose S3 as a gateway so that it will always have good state of
indices and new elasticsearch data node will recover from it)
[2012-09-26 06:48:42,678][WARN ][cluster.action.shard ]
[pgossamerv01_slave3] sending failed shard for
[pblueprint3221423402385730][4], node[rlqethc0Rr6NRVW-6Mj1gw], [P],
s[INITIALIZING], reason [Failed to start shard, message
[IndexShardGatewayRecoveryException[[pblueprint3221423402385730][4] No
commit point data is available in gateway]]]
[2012-09-26 06:48:42,693][WARN ][index.gateway.s3 ]
[pgossamerv01_slave3] [pblueprint3693375325864359][3] listed commit_point
[commit-f]/[15], but not all files exists, ignoring
[2012-09-26 06:48:42,693][WARN ][indices.cluster ]
[pgossamerv01_slave3] [pblueprint3693375325864359][3] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[pblueprint3693375325864359][3] No commit point data is available in gateway
at
org.elasticsearch.index.gateway.blobstore.BlobStoreIndexShardGateway.recover(BlobStoreIndexShardGateway.java:427)
at
org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:177)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
[2012-09-26 06:48:42,694][WARN ][cluster.action.shard ]
[pgossamerv01_slave3] sending failed shard for
[pblueprint3693375325864359][3], node[rlqethc0Rr6NRVW-6Mj1gw], [P],
s[INITIALIZING], reason [Failed to start shard, message
[IndexShardGatewayRecoveryException[[pblueprint3693375325864359][3] No
commit point data is available in gateway]]]
ubuntu@ip-10-68-70-193:/var/log/elasticsearch$ ls
pgossamerv01_index_search_slowlog.log pgossamerv01.log
ubuntu@ip-10-68-70-193:/var/log/elasticsearch$ tail pgossamerv01.log
[2012-09-26 06:56:29,796][WARN ][cluster.action.shard ]
[pgossamerv01_slave3] sending failed shard for
[pblueprint3221423402385730][4], node[rlqethc0Rr6NRVW-6Mj1gw], [P],
s[INITIALIZING], reason [Failed to start shard, message
[IndexShardGatewayRecoveryException[[pblueprint3221423402385730][4] No
commit point data is available in gateway]]]
[2012-09-26 06:56:30,412][WARN ][index.gateway.s3 ]
[pgossamerv01_slave3] [pblueprint3221423402385730][2] listed commit_point
[commit-4]/[4], but not all files exists, ignoring
[2012-09-26 06:56:30,413][WARN ][indices.cluster ]
[pgossamerv01_slave3] [pblueprint3221423402385730][2] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[pblueprint3221423402385730][2] No commit point data is available in gateway
at
org.elasticsearch.index.gateway.blobstore.BlobStoreIndexShardGateway.recover(BlobStoreIndexShardGateway.java:427)
at
org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:177)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
[2012-09-26 06:56:30,414][WARN ][cluster.action.shard ]
[pgossamerv01_slave3] sending failed shard for
[pblueprint3221423402385730][2], node[rlqethc0Rr6NRVW-6Mj1gw], [P],
s[INITIALIZING], reason [Failed to start shard, message
[IndexShardGatewayRecoveryException[[pblueprint3221423402385730][2] No
commit point data is available in gateway]]]
--
Sujan
--