Indexing/shard failure


(Adam Estrada) #1

All,

There are a few other posts related to this same thing but I still can't
get my index to recover correctly. I flushed and refreshed the index and
deleted the translog from shard #3. Still no dice. I am running ES 0.18.7
on Windows Server 2008. Any thoughts on how to recover the data I've lost?

[2012-03-30 08:41:21,334][WARN ][indices.memory ]

[Geoglobaldomination
] failed to set shard [tweets][3] index buffer to [10.1mb]
[2012-03-30 08:41:21,741][WARN ][cluster.action.shard ]
[Geoglobaldomination
] sending failed shard for [tweets][3], node[YsQ2QjHdTcG184MVUcXUVg], [P],
s[INI
TIALIZING], reason [Failed to start shard, message
[IndexShardGatewayRecoveryExc
eption[[tweets][3] failed recovery]; nested:
EngineCreationFailureException[[twe
ets][3] Failed to open reader on writer]; nested:
FileNotFoundException[_1orh.pr
x]; ]]
[2012-03-30 08:41:21,741][WARN ][cluster.action.shard ]
[Geoglobaldomination
] received shard failed for [tweets][3], node[YsQ2QjHdTcG184MVUcXUVg],
[P], s[IN
ITIALIZING], reason [Failed to start shard, message
[IndexShardGatewayRecoveryEx
ception[[tweets][3] failed recovery]; nested:
EngineCreationFailureException[[tw
eets][3] Failed to open reader on writer]; nested:
FileNotFoundException[_1orh.p
rx]; ]]
[2012-03-30 08:41:24,881][WARN ][indices.cluster ]
[Geoglobaldomination
] [tweets][3] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[tweets][3]
failed recovery
at
org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexS
hardGatewayService.java:229)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
Source
)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
at java.lang.Thread.run(Unknown Source)
Caused by: org.elasticsearch.index.engine.EngineCreationFailureException:
[tweet
s][3] Failed to open reader on writer
at
org.elasticsearch.index.engine.robin.RobinEngine.start(RobinEngine.ja
va:284)
at
org.elasticsearch.index.shard.service.InternalIndexShard.performRecov
eryPrepareForTranslog(InternalIndexShard.java:535)
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(
LocalIndexShardGateway.java:164)
at
org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexS
hardGatewayService.java:179)
... 3 more
Caused by: java.io.FileNotFoundException: _1orh.prx
at
org.elasticsearch.index.store.Store$StoreDirectory.openInput(Store.ja
va:450)
at
org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.
java:89)
at
org.apache.lucene.index.SegmentReader.get(SegmentReader.java:115)
at
org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:7
05)
at
org.apache.lucene.index.IndexWriter$ReaderPool.getReadOnlyClone(Index
Writer.java:663)
at
org.apache.lucene.index.DirectoryReader.(DirectoryReader.java:1
57)
at
org.apache.lucene.index.ReadOnlyDirectoryReader.(ReadOnlyDirect
oryReader.java:38)
at
org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:453)
at
org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:401)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:345)
at
org.elasticsearch.index.engine.robin.RobinEngine.buildNrtResource(Rob
inEngine.java:1365)
at
org.elasticsearch.index.engine.robin.RobinEngine.start(RobinEngine.ja
va:268)
... 6 more
[2012-03-30 08:41:24,881][WARN ][cluster.action.shard ]
[Geoglobaldomination
] sending failed shard for [tweets][3], node[YsQ2QjHdTcG184MVUcXUVg], [P],
s[INI
TIALIZING], reason [Failed to start shard, message
[IndexShardGatewayRecoveryExc
eption[[tweets][3] failed recovery]; nested:
EngineCreationFailureException[[twe
ets][3] Failed to open reader on writer]; nested:
FileNotFoundException[_1orh.pr
x]; ]]


(Shay Banon) #2

It seems like its missing a specific file for the shard index, how many
nodes do you have in the cluster?

On Fri, Mar 30, 2012 at 4:04 PM, Adam Estrada estrada.adam@gmail.comwrote:

All,

There are a few other posts related to this same thing but I still can't
get my index to recover correctly. I flushed and refreshed the index and
deleted the translog from shard #3. Still no dice. I am running ES 0.18.7
on Windows Server 2008. Any thoughts on how to recover the data I've lost?

[2012-03-30 08:41:21,334][WARN ][indices.memory ]

[Geoglobaldomination
] failed to set shard [tweets][3] index buffer to [10.1mb]
[2012-03-30 08:41:21,741][WARN ][cluster.action.shard ]
[Geoglobaldomination
] sending failed shard for [tweets][3], node[YsQ2QjHdTcG184MVUcXUVg],
[P], s[INI
TIALIZING], reason [Failed to start shard, message
[IndexShardGatewayRecoveryExc
eption[[tweets][3] failed recovery]; nested:
EngineCreationFailureException[[twe
ets][3] Failed to open reader on writer]; nested: FileNotFoundException[_
1orh.pr
x]; ]]
[2012-03-30 08:41:21,741][WARN ][cluster.action.shard ]
[Geoglobaldomination
] received shard failed for [tweets][3], node[YsQ2QjHdTcG184MVUcXUVg],
[P], s[IN
ITIALIZING], reason [Failed to start shard, message
[IndexShardGatewayRecoveryEx
ception[[tweets][3] failed recovery]; nested:
EngineCreationFailureException[[tw
eets][3] Failed to open reader on writer]; nested:
FileNotFoundException[_1orh.p
rx]; ]]
[2012-03-30 08:41:24,881][WARN ][indices.cluster ]
[Geoglobaldomination
] [tweets][3] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[tweets][3]
failed recovery
at
org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexS
hardGatewayService.java:229)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
Source
)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
at java.lang.Thread.run(Unknown Source)
Caused by: org.elasticsearch.index.engine.EngineCreationFailureException:
[tweet
s][3] Failed to open reader on writer
at
org.elasticsearch.index.engine.robin.RobinEngine.start(RobinEngine.ja
va:284)
at
org.elasticsearch.index.shard.service.InternalIndexShard.performRecov
eryPrepareForTranslog(InternalIndexShard.java:535)
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(
LocalIndexShardGateway.java:164)
at
org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexS
hardGatewayService.java:179)
... 3 more
Caused by: java.io.FileNotFoundException: 1orh.prx
at
org.elasticsearch.index.store.Store$StoreDirectory.openInput(Store.ja
va:450)
at
org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.
java:89)
at
org.apache.lucene.index.SegmentReader.get(SegmentReader.java:115)
at
org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:7
05)
at
org.apache.lucene.index.IndexWriter$ReaderPool.getReadOnlyClone(Index
Writer.java:663)
at
org.apache.lucene.index.DirectoryReader.(DirectoryReader.java:1
57)
at
org.apache.lucene.index.ReadOnlyDirectoryReader.(ReadOnlyDirect
oryReader.java:38)
at
org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:453)
at
org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:401)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:345)
at
org.elasticsearch.index.engine.robin.RobinEngine.buildNrtResource(Rob
inEngine.java:1365)
at
org.elasticsearch.index.engine.robin.RobinEngine.start(RobinEngine.ja
va:268)
... 6 more
[2012-03-30 08:41:24,881][WARN ][cluster.action.shard ]
[Geoglobaldomination
] sending failed shard for [tweets][3], node[YsQ2QjHdTcG184MVUcXUVg],
[P], s[INI
TIALIZING], reason [Failed to start shard, message
[IndexShardGatewayRecoveryExc
eption[[tweets][3] failed recovery]; nested:
EngineCreationFailureException[[twe
ets][3] Failed to open reader on writer]; nested: FileNotFoundException[

1orh.pr
x]; ]]


(Adam Estrada) #3

Shay,

This was running on a single node.

Adam
On Mar 31, 2012 8:50 PM, "Shay Banon" kimchy@gmail.com wrote:

It seems like its missing a specific file for the shard index, how many
nodes do you have in the cluster?

On Fri, Mar 30, 2012 at 4:04 PM, Adam Estrada estrada.adam@gmail.comwrote:

All,

There are a few other posts related to this same thing but I still can't
get my index to recover correctly. I flushed and refreshed the index and
deleted the translog from shard #3. Still no dice. I am running ES 0.18.7
on Windows Server 2008. Any thoughts on how to recover the data I've lost?

[2012-03-30 08:41:21,334][WARN ][indices.memory ]

[Geoglobaldomination
] failed to set shard [tweets][3] index buffer to [10.1mb]
[2012-03-30 08:41:21,741][WARN ][cluster.action.shard ]
[Geoglobaldomination
] sending failed shard for [tweets][3], node[YsQ2QjHdTcG184MVUcXUVg],
[P], s[INI
TIALIZING], reason [Failed to start shard, message
[IndexShardGatewayRecoveryExc
eption[[tweets][3] failed recovery]; nested:
EngineCreationFailureException[[twe
ets][3] Failed to open reader on writer]; nested: FileNotFoundException[_
1orh.pr
x]; ]]
[2012-03-30 08:41:21,741][WARN ][cluster.action.shard ]
[Geoglobaldomination
] received shard failed for [tweets][3], node[YsQ2QjHdTcG184MVUcXUVg],
[P], s[IN
ITIALIZING], reason [Failed to start shard, message
[IndexShardGatewayRecoveryEx
ception[[tweets][3] failed recovery]; nested:
EngineCreationFailureException[[tw
eets][3] Failed to open reader on writer]; nested:
FileNotFoundException[_1orh.p
rx]; ]]
[2012-03-30 08:41:24,881][WARN ][indices.cluster ]
[Geoglobaldomination
] [tweets][3] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[tweets][3]
failed recovery
at
org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexS
hardGatewayService.java:229)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source
)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
at java.lang.Thread.run(Unknown Source)
Caused by:
org.elasticsearch.index.engine.EngineCreationFailureException: [tweet
s][3] Failed to open reader on writer
at
org.elasticsearch.index.engine.robin.RobinEngine.start(RobinEngine.ja
va:284)
at
org.elasticsearch.index.shard.service.InternalIndexShard.performRecov
eryPrepareForTranslog(InternalIndexShard.java:535)
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(
LocalIndexShardGateway.java:164)
at
org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexS
hardGatewayService.java:179)
... 3 more
Caused by: java.io.FileNotFoundException: 1orh.prx
at
org.elasticsearch.index.store.Store$StoreDirectory.openInput(Store.ja
va:450)
at
org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.
java:89)
at
org.apache.lucene.index.SegmentReader.get(SegmentReader.java:115)
at
org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:7
05)
at
org.apache.lucene.index.IndexWriter$ReaderPool.getReadOnlyClone(Index
Writer.java:663)
at
org.apache.lucene.index.DirectoryReader.(DirectoryReader.java:1
57)
at
org.apache.lucene.index.ReadOnlyDirectoryReader.(ReadOnlyDirect
oryReader.java:38)
at
org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:453)
at
org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:401)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:345)
at
org.elasticsearch.index.engine.robin.RobinEngine.buildNrtResource(Rob
inEngine.java:1365)
at
org.elasticsearch.index.engine.robin.RobinEngine.start(RobinEngine.ja
va:268)
... 6 more
[2012-03-30 08:41:24,881][WARN ][cluster.action.shard ]
[Geoglobaldomination
] sending failed shard for [tweets][3], node[YsQ2QjHdTcG184MVUcXUVg],
[P], s[INI
TIALIZING], reason [Failed to start shard, message
[IndexShardGatewayRecoveryExc
eption[[tweets][3] failed recovery]; nested:
EngineCreationFailureException[[twe
ets][3] Failed to open reader on writer]; nested: FileNotFoundException[

1orh.pr
x]; ]]


(Shay Banon) #4

I see, so it has no replica to recover from. The strange this is that this
single file that is missing will not get deleted by anything in
elasticsearch or Lucene, its either all gets deleted (and they you get a
different message) or nothing... . Maybe it somehow got deleted by mistake?
Which version are you using?

On Sun, Apr 1, 2012 at 3:20 AM, Adam Estrada estrada.adam@gmail.com wrote:

Shay,

This was running on a single node.

Adam
On Mar 31, 2012 8:50 PM, "Shay Banon" kimchy@gmail.com wrote:

It seems like its missing a specific file for the shard index, how many
nodes do you have in the cluster?

On Fri, Mar 30, 2012 at 4:04 PM, Adam Estrada estrada.adam@gmail.comwrote:

All,

There are a few other posts related to this same thing but I still can't
get my index to recover correctly. I flushed and refreshed the index and
deleted the translog from shard #3. Still no dice. I am running ES 0.18.7
on Windows Server 2008. Any thoughts on how to recover the data I've lost?

[2012-03-30 08:41:21,334][WARN ][indices.memory ]

[Geoglobaldomination
] failed to set shard [tweets][3] index buffer to [10.1mb]
[2012-03-30 08:41:21,741][WARN ][cluster.action.shard ]
[Geoglobaldomination
] sending failed shard for [tweets][3], node[YsQ2QjHdTcG184MVUcXUVg],
[P], s[INI
TIALIZING], reason [Failed to start shard, message
[IndexShardGatewayRecoveryExc
eption[[tweets][3] failed recovery]; nested:
EngineCreationFailureException[[twe
ets][3] Failed to open reader on writer]; nested:
FileNotFoundException[_1orh.pr
x]; ]]
[2012-03-30 08:41:21,741][WARN ][cluster.action.shard ]
[Geoglobaldomination
] received shard failed for [tweets][3], node[YsQ2QjHdTcG184MVUcXUVg],
[P], s[IN
ITIALIZING], reason [Failed to start shard, message
[IndexShardGatewayRecoveryEx
ception[[tweets][3] failed recovery]; nested:
EngineCreationFailureException[[tw
eets][3] Failed to open reader on writer]; nested:
FileNotFoundException[_1orh.p
rx]; ]]
[2012-03-30 08:41:24,881][WARN ][indices.cluster ]
[Geoglobaldomination
] [tweets][3] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[tweets][3]
failed recovery
at
org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexS
hardGatewayService.java:229)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source
)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
at java.lang.Thread.run(Unknown Source)
Caused by:
org.elasticsearch.index.engine.EngineCreationFailureException: [tweet
s][3] Failed to open reader on writer
at
org.elasticsearch.index.engine.robin.RobinEngine.start(RobinEngine.ja
va:284)
at
org.elasticsearch.index.shard.service.InternalIndexShard.performRecov
eryPrepareForTranslog(InternalIndexShard.java:535)
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(
LocalIndexShardGateway.java:164)
at
org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexS
hardGatewayService.java:179)
... 3 more
Caused by: java.io.FileNotFoundException: _1orh.prx
at
org.elasticsearch.index.store.Store$StoreDirectory.openInput(Store.ja
va:450)
at
org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.
java:89)
at
org.apache.lucene.index.SegmentReader.get(SegmentReader.java:115)
at
org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:7
05)
at
org.apache.lucene.index.IndexWriter$ReaderPool.getReadOnlyClone(Index
Writer.java:663)
at
org.apache.lucene.index.DirectoryReader.(DirectoryReader.java:1
57)
at
org.apache.lucene.index.ReadOnlyDirectoryReader.(ReadOnlyDirect
oryReader.java:38)
at
org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:453)
at
org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:401)
at
org.apache.lucene.index.IndexReader.open(IndexReader.java:345)
at
org.elasticsearch.index.engine.robin.RobinEngine.buildNrtResource(Rob
inEngine.java:1365)
at
org.elasticsearch.index.engine.robin.RobinEngine.start(RobinEngine.ja
va:268)
... 6 more
[2012-03-30 08:41:24,881][WARN ][cluster.action.shard ]
[Geoglobaldomination
] sending failed shard for [tweets][3], node[YsQ2QjHdTcG184MVUcXUVg],
[P], s[INI
TIALIZING], reason [Failed to start shard, message
[IndexShardGatewayRecoveryExc
eption[[tweets][3] failed recovery]; nested:
EngineCreationFailureException[[twe
ets][3] Failed to open reader on writer]; nested:
FileNotFoundException[_1orh.pr
x]; ]]


(Adam Estrada) #5

Shay,

I was using 0.87.1 I think but have since switched to the latest version of
ES. I did find it odd that the single index file was missing. I searched
through the recycle bin and all that and it just doesn't exist any more.
Not worry...Data was lost but its not the end of the world :wink:

Adam

On Tue, Apr 3, 2012 at 10:04 AM, Shay Banon kimchy@gmail.com wrote:

I see, so it has no replica to recover from. The strange this is that this
single file that is missing will not get deleted by anything in
elasticsearch or Lucene, its either all gets deleted (and they you get a
different message) or nothing... . Maybe it somehow got deleted by mistake?
Which version are you using?

On Sun, Apr 1, 2012 at 3:20 AM, Adam Estrada estrada.adam@gmail.comwrote:

Shay,

This was running on a single node.

Adam
On Mar 31, 2012 8:50 PM, "Shay Banon" kimchy@gmail.com wrote:

It seems like its missing a specific file for the shard index, how many
nodes do you have in the cluster?

On Fri, Mar 30, 2012 at 4:04 PM, Adam Estrada estrada.adam@gmail.comwrote:

All,

There are a few other posts related to this same thing but I still
can't get my index to recover correctly. I flushed and refreshed the index
and deleted the translog from shard #3. Still no dice. I am running ES
0.18.7 on Windows Server 2008. Any thoughts on how to recover the data I've
lost?

[2012-03-30 08:41:21,334][WARN ][indices.memory ]

[Geoglobaldomination
] failed to set shard [tweets][3] index buffer to [10.1mb]
[2012-03-30 08:41:21,741][WARN ][cluster.action.shard ]
[Geoglobaldomination
] sending failed shard for [tweets][3], node[YsQ2QjHdTcG184MVUcXUVg],
[P], s[INI
TIALIZING], reason [Failed to start shard, message
[IndexShardGatewayRecoveryExc
eption[[tweets][3] failed recovery]; nested:
EngineCreationFailureException[[twe
ets][3] Failed to open reader on writer]; nested:
FileNotFoundException[_1orh.pr
x]; ]]
[2012-03-30 08:41:21,741][WARN ][cluster.action.shard ]
[Geoglobaldomination
] received shard failed for [tweets][3], node[YsQ2QjHdTcG184MVUcXUVg],
[P], s[IN
ITIALIZING], reason [Failed to start shard, message
[IndexShardGatewayRecoveryEx
ception[[tweets][3] failed recovery]; nested:
EngineCreationFailureException[[tw
eets][3] Failed to open reader on writer]; nested:
FileNotFoundException[_1orh.p
rx]; ]]
[2012-03-30 08:41:24,881][WARN ][indices.cluster ]
[Geoglobaldomination
] [tweets][3] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
[tweets][3]
failed recovery
at
org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexS
hardGatewayService.java:229)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source
)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
at java.lang.Thread.run(Unknown Source)
Caused by:
org.elasticsearch.index.engine.EngineCreationFailureException: [tweet
s][3] Failed to open reader on writer
at
org.elasticsearch.index.engine.robin.RobinEngine.start(RobinEngine.ja
va:284)
at
org.elasticsearch.index.shard.service.InternalIndexShard.performRecov
eryPrepareForTranslog(InternalIndexShard.java:535)
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(
LocalIndexShardGateway.java:164)
at
org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexS
hardGatewayService.java:179)
... 3 more
Caused by: java.io.FileNotFoundException: _1orh.prx
at
org.elasticsearch.index.store.Store$StoreDirectory.openInput(Store.ja
va:450)
at
org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.
java:89)
at
org.apache.lucene.index.SegmentReader.get(SegmentReader.java:115)
at
org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:7
05)
at
org.apache.lucene.index.IndexWriter$ReaderPool.getReadOnlyClone(Index
Writer.java:663)
at
org.apache.lucene.index.DirectoryReader.(DirectoryReader.java:1
57)
at
org.apache.lucene.index.ReadOnlyDirectoryReader.(ReadOnlyDirect
oryReader.java:38)
at
org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:453)
at
org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:401)
at
org.apache.lucene.index.IndexReader.open(IndexReader.java:345)
at
org.elasticsearch.index.engine.robin.RobinEngine.buildNrtResource(Rob
inEngine.java:1365)
at
org.elasticsearch.index.engine.robin.RobinEngine.start(RobinEngine.ja
va:268)
... 6 more
[2012-03-30 08:41:24,881][WARN ][cluster.action.shard ]
[Geoglobaldomination
] sending failed shard for [tweets][3], node[YsQ2QjHdTcG184MVUcXUVg],
[P], s[INI
TIALIZING], reason [Failed to start shard, message
[IndexShardGatewayRecoveryExc
eption[[tweets][3] failed recovery]; nested:
EngineCreationFailureException[[twe
ets][3] Failed to open reader on writer]; nested:
FileNotFoundException[_1orh.pr
x]; ]]


(system) #6