Losing data after Elasticsearch restart

rohit_jaiswal · June 9, 2014, 2:15am

Hello Everyone,
We lost data after restarting Elasticsearch
cluster. Restarting is a part of deploying our software stack.

                     We have a 20-node cluster running 0.90.2 and we

have Splunk configured to index ES logs.

                     Looking at the Splunk logs, we could find the

following error a day before the deployment (restart) -

            [cluster.action.shard     ] [Rictor] sending failed shard for [c0a71ddaa70b463a9a179c36c7fc26e3][2], node[nJvnclczRNaLbETunjlcWw], [R], s[STARTED], reason
            [Failed to perform [bulk/shard] on replica, message [RemoteTransportException; nested: ResponseHandlerFailureTransportException; nested: NullPointerException; ]]

            [cluster.action.shard     ] [Kiss] received shard failed for [c0a71ddaa70b463a9a179c36c7fc26e3][2], node[nJvnclczRNaLbETunjlcWw], [R], s[STARTED], reason 
            [Failed to perform [bulk/shard] on replica, message [RemoteTransportException; nested: ResponseHandlerFailureTransportException; nested: NullPointerException; ]]  



                      Further,* a day after the deploy,* we see the

same errors on another node -

            [cluster.action.shard     ] [Contrary] received shard failed for [a58f9413315048ecb0abea48f5f6aae7][1], node[3UbHwVCkQvO3XroIl-awPw], [R], s[STARTED], reason
            [Failed to perform [bulk/shard] on replica, message [RemoteTransportException; nested: ResponseHandlerFailureTransportException; nested: NullPointerException; ]]

       
         *Immediately next, the following error is seen*. This error is seen repeatedly on a couple of other nodes as well - 

             failed to start shard

             [cluster.action.shard     ] [Copperhead] sending failed shard for [a58f9413315048ecb0abea48f5f6aae7][0], node[EuRzr3MLQiSS6lzTZJbiKw], [R], s[INITIALIZING],
             reason [Failed to start shard, message [RecoveryFailedException[[a58f9413315048ecb0abea48f5f6aae7][0]: Recovery failed from [Frank Castle][dlv2mPypQaOxLPQhHQ67Fw]
             [inet[/10.2.136.81:9300]] into [Copperhead][EuRzr3MLQiSS6lzTZJbiKw][inet[/10.3.207.55:9300]]]; nested: RemoteTransportException[[Frank Castle]
             [inet[/10.2.136.81:9300]][index/shard/recovery/startRecovery]]; nested: RecoveryEngineException[[a58f9413315048ecb0abea48f5f6aae7][0] Phase[2] Execution failed]; 
             nested: RemoteTransportException[[Copperhead][inet[/10.3.207.55:9300]][index/shard/recovery/translogOps]]; nested: InvalidAliasNameException[[a58f9413315048ecb0abea48f5f6aae7]

    Invalid alias name [fbf1e55418a2327d308e7632911f9bb8bfed58059dd7f1e4abd3467c5f8519c3], Unknown alias name was passed to alias Filter]; ]]*

During this time, we could not access previously indexed documents.
I looked up the alias error, looks like it is related to https://github.com/elasticsearch/elasticsearch/issues/1198 (Delete By Query wrongly persisted to translog # 1198),
but this should be fixed in ES 0.18.0 and, we are using 0.90.2, so why is ES encountering this issue?

         What do we need to do to set this right and get back lost data? Please help.

Thanks.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/00e54753-ab89-4f63-a39e-0931e8f7e2f0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

spinscale · June 16, 2014, 1:13pm

Hey,

without stack traces it is pretty hard to see the actual problem, do you
have them around (on one node this exception has happened, so it should
have been logged into the elasticsearch logfile as well). Also, you should
really upgrade if possible, as releases after 0.90.2 have seen many many
improvements.

--Alex

On Mon, Jun 9, 2014 at 4:15 AM, Rohit Jaiswal rohit.jaiswal@gmail.com
wrote:

Hello Everyone,
We lost data after restarting Elasticsearch
cluster. Restarting is a part of deploying our software stack.
                     We have a 20-node cluster running 0.90.2 and we
have Splunk configured to index ES logs.
                     Looking at the Splunk logs, we could find the
following error a day before the deployment (restart) -
            [cluster.action.shard     ] [Rictor] sending failed shard for [c0a71ddaa70b463a9a179c36c7fc26e3][2], node[nJvnclczRNaLbETunjlcWw], [R], s[STARTED], reason
            [Failed to perform [bulk/shard] on replica, message [RemoteTransportException; nested: ResponseHandlerFailureTransportException; nested: NullPointerException; ]]

            [cluster.action.shard     ] [Kiss] received shard failed for [c0a71ddaa70b463a9a179c36c7fc26e3][2], node[nJvnclczRNaLbETunjlcWw], [R], s[STARTED], reason
            [Failed to perform [bulk/shard] on replica, message [RemoteTransportException; nested: ResponseHandlerFailureTransportException; nested: NullPointerException; ]]



                      Further,* a day after the deploy,* we see the
same errors on another node -
            [cluster.action.shard     ] [Contrary] received shard failed for [a58f9413315048ecb0abea48f5f6aae7][1], node[3UbHwVCkQvO3XroIl-awPw], [R], s[STARTED], reason
            [Failed to perform [bulk/shard] on replica, message [RemoteTransportException; nested: ResponseHandlerFailureTransportException; nested: NullPointerException; ]]


         *Immediately next, the following error is seen*. This error is seen repeatedly on a couple of other nodes as well -

             failed to start shard

             [cluster.action.shard     ] [Copperhead] sending failed shard for [a58f9413315048ecb0abea48f5f6aae7][0], node[EuRzr3MLQiSS6lzTZJbiKw], [R], s[INITIALIZING],
             reason [Failed to start shard, message [RecoveryFailedException[[a58f9413315048ecb0abea48f5f6aae7][0]: Recovery failed from [Frank Castle][dlv2mPypQaOxLPQhHQ67Fw]
             [inet[/10.2.136.81:9300]] into [Copperhead][EuRzr3MLQiSS6lzTZJbiKw][inet[/10.3.207.55:9300]]]; nested: RemoteTransportException[[Frank Castle]
             [inet[/10.2.136.81:9300]][index/shard/recovery/startRecovery]]; nested: RecoveryEngineException[[a58f9413315048ecb0abea48f5f6aae7][0] Phase[2] Execution failed];
             nested: RemoteTransportException[[Copperhead][inet[/10.3.207.55:9300]][index/shard/recovery/translogOps]]; nested: InvalidAliasNameException[[a58f9413315048ecb0abea48f5f6aae7]
    Invalid alias name [fbf1e55418a2327d308e7632911f9bb8bfed58059dd7f1e4abd3467c5f8519c3], Unknown alias name was passed to alias Filter]; ]]*
During this time, we could not access previously indexed documents.
I looked up the alias error, looks like it is related to Delete By Query wrongly persisted to translog · Issue #1198 · elastic/elasticsearch · GitHub (Delete By Query wrongly persisted to translog # 1198),
but this should be fixed in ES 0.18.0 and, we are using 0.90.2, so why is ES encountering this issue?
         What do we need to do to set this right and get back lost data? Please help.
Thanks.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/00e54753-ab89-4f63-a39e-0931e8f7e2f0%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/00e54753-ab89-4f63-a39e-0931e8f7e2f0%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGCwEM8yrprZNCpzNqOiDzaoFwqh6Dth23OSc1byZe81P7Ba9w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

rohit_jaiswal · June 16, 2014, 5:25pm

Hi Alexander,
Thanks for your reply. We plan to upgrade in the
long run, however we need to fix the data loss problem on 0.90.2 in the
immediate term.

Here is the stack trace -

10:09:37.783 PM

[22:09:37,783][WARN ][indices.cluster ] [Storm]
[b7a76aa06cfd4048987d1117f3e0433a][0] failed to start shard
org.elasticsearch.indices.recovery.RecoveryFailedException:
[b7a76aa06cfd4048987d1117f3e0433a][0]: Recovery failed from [Jeffrey
Mace][_jjr5BYJQjO6QzzheyDmhw][inet[/10.4.35.200:9300]] into
[Storm][FiW6mbR5ThqqSii5Wc28lQ][inet[/10.4.40.95:9300]]
at
org.elasticsearch.indices.recovery.RecoveryTarget.doRecovery(RecoveryTarget.java:293)
at
org.elasticsearch.indices.recovery.RecoveryTarget.access$300(RecoveryTarget.java:62)
at
org.elasticsearch.indices.recovery.RecoveryTarget$2.run(RecoveryTarget.java:163)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: org.elasticsearch.transport.RemoteTransportException: [Jeffrey
Mace][inet[/10.4.35.200:9300]][index/shard/recovery/startRecovery]
Caused by: org.elasticsearch.index.engine.RecoveryEngineException:
[b7a76aa06cfd4048987d1117f3e0433a][0] Phase[2] Execution failed
at
org.elasticsearch.index.engine.robin.RobinEngine.recover(RobinEngine.java:1147)
at
org.elasticsearch.index.shard.service.InternalIndexShard.recover(InternalIndexShard.java:526)
at
org.elasticsearch.indices.recovery.RecoverySource.recover(RecoverySource.java:116)
at
org.elasticsearch.indices.recovery.RecoverySource.access$1600(RecoverySource.java:60)
at
org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:328)
at
org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:314)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:265)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: org.elasticsearch.transport.RemoteTransportException:
[Storm][inet[/10.4.40.95:9300]][index/shard/recovery/translogOps]
Caused by: org.elasticsearch.indices.InvalidAliasNameException:
[b7a76aa06cfd4048987d1117f3e0433a] Invalid alias name
[1a4077872e41c0634cee780c1e5fc263bdd5f14b15ac9239480547ab2d3601eb], Unknown
alias name was passed to alias Filter
at
org.elasticsearch.index.aliases.IndexAliasesService.aliasFilter(IndexAliasesService.java:99)
at
org.elasticsearch.index.shard.service.InternalIndexShard.prepareDeleteByQuery(InternalIndexShard.java:382)
at
org.elasticsearch.index.shard.service.InternalIndexShard.performRecoveryOperation(InternalIndexShard.java:628)
at
org.elasticsearch.indices.recovery.RecoveryTarget$TranslogOperationsRequestHandler.messageReceived(RecoveryTarget.java:447)
at
org.elasticsearch.indices.recovery.RecoveryTarget$TranslogOperationsRequestHandler.messageReceived(RecoveryTarget.java:416)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:265)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
[22:09:37,799][WARN ][cluster.action.shard ] [Storm] sending failed
shard for [b7a76aa06cfd4048987d1117f3e0433a][0],
node[FiW6mbR5ThqqSii5Wc28lQ], [R], s[INITIALIZING], reason [Failed to start
shard, message
[RecoveryFailedException[[b7a76aa06cfd4048987d1117f3e0433a][0]: Recovery
failed from [Jeffrey Mace][_jjr5BYJQjO6QzzheyDmhw][inet[/10.4.35.200:9300]]
into [Storm][FiW6mbR5ThqqSii5Wc28lQ][inet[/10.4.40.95:9300]]]; nested:
RemoteTransportException[[Jeffrey
Mace][inet[/10.4.35.200:9300]][index/shard/recovery/startRecovery]];
nested: RecoveryEngineException[[b7a76aa06cfd4048987d1117f3e0433a][0]
Phase[2] Execution failed]; nested:
RemoteTransportException[[Storm][inet[/10.4.40.95:9300]][index/shard/recovery/translogOps]];
nested: InvalidAliasNameException[[b7a76aa06cfd4048987d1117f3e0433a]
Invalid alias name
[1a4077872e41c0634cee780c1e5fc263bdd5f14b15ac9239480547ab2d3601eb], Unknown
alias name was passed to alias Filter]; ]]
[22:09:38,025][WARN ][indices.cluster ] [Storm]
[b7a76aa06cfd4048987d1117f3e0433a][0] failed to start shard
org.elasticsearch.indices.recovery.RecoveryFailedException:
[b7a76aa06cfd4048987d1117f3e0433a][0]: Recovery failed from [Jeffrey
Mace][_jjr5BYJQjO6QzzheyDmhw][inet[/10.4.35.200:9300]] into
[Storm][FiW6mbR5ThqqSii5Wc28lQ][inet[/10.4.40.95:9300]]
at
org.elasticsearch.indices.recovery.RecoveryTarget.doRecovery(RecoveryTarget.java:293)
at
org.elasticsearch.indices.recovery.RecoveryTarget.access$300(RecoveryTarget.java:62)
at
org.elasticsearch.indices.recovery.RecoveryTarget$2.run(RecoveryTarget.java:163)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: org.elasticsearch.transport.RemoteTransportException: [Jeffrey
Mace][inet[/10.4.35.200:9300]][index/shard/recovery/startRecovery]
Caused by: org.elasticsearch.index.engine.RecoveryEngineException:
[b7a76aa06cfd4048987d1117f3e0433a][0] Phase[2] Execution failed
at
org.elasticsearch.index.engine.robin.RobinEngine.recover(RobinEngine.java:1147)
at
org.elasticsearch.index.shard.service.InternalIndexShard.recover(InternalIndexShard.java:526)
at
org.elasticsearch.indices.recovery.RecoverySource.recover(RecoverySource.java:116)
at
org.elasticsearch.indices.recovery.RecoverySource.access$1600(RecoverySource.java:60)
at
org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:328)
at
org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:314)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:265)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: org.elasticsearch.transport.RemoteTransportException:
[Storm][inet[/10.4.40.95:9300]][index/shard/recovery/translogOps]
Caused by: org.elasticsearch.indices.InvalidAliasNameException:
[b7a76aa06cfd4048987d1117f3e0433a] Invalid alias name
[1a4077872e41c0634cee780c1e5fc263bdd5f14b15ac9239480547ab2d3601eb], Unknown
alias name was passed to alias Filter
at
org.elasticsearch.index.aliases.IndexAliasesService.aliasFilter(IndexAliasesService.java:99)
at
org.elasticsearch.index.shard.service.InternalIndexShard.prepareDeleteByQuery(InternalIndexShard.java:382)
at
org.elasticsearch.index.shard.service.InternalIndexShard.performRecoveryOperation(InternalIndexShard.java:628)
at
org.elasticsearch.indices.recovery.RecoveryTarget$TranslogOperationsRequestHandler.messageReceived(RecoveryTarget.java:447)
at
org.elasticsearch.indices.recovery.RecoveryTarget$TranslogOperationsRequestHandler.messageReceived(RecoveryTarget.java:416)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:265)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)

[22:09:38,042][WARN ][cluster.action.shard ] [Storm] sending failed
shard for [b7a76aa06cfd4048987d1117f3e0433a][0],
node[FiW6mbR5ThqqSii5Wc28lQ], [R], s[INITIALIZING], reason [Failed to start
shard, message
[RecoveryFailedException[[b7a76aa06cfd4048987d1117f3e0433a][0]: Recovery
failed from [Jeffrey Mace][_jjr5BYJQjO6QzzheyDmhw][inet[/10.4.35.200:9300]]
into [Storm][FiW6mbR5ThqqSii5Wc28lQ][inet[/10.4.40.95:9300]]]; nested:
RemoteTransportException[[Jeffrey
Mace][inet[/10.4.35.200:9300]][index/shard/recovery/startRecovery]];
nested: RecoveryEngineException[[b7a76aa06cfd4048987d1117f3e0433a][0]
Phase[2] Execution failed]; nested:
RemoteTransportException[[Storm][inet[/10.4.40.95:9300]][index/shard/recovery/translogOps]];
nested: InvalidAliasNameException[[b7a76aa06cfd4048987d1117f3e0433a]
Invalid alias name
[1a4077872e41c0634cee780c1e5fc263bdd5f14b15ac9239480547ab2d3601eb], Unknown
alias name was passed to alias Filter]; ]]

Let us know..

Thanks,
Rohit

On Mon, Jun 16, 2014 at 6:13 AM, Alexander Reelsen alr@spinscale.de wrote:

Hey,

without stack traces it is pretty hard to see the actual problem, do you
have them around (on one node this exception has happened, so it should
have been logged into the elasticsearch logfile as well). Also, you should
really upgrade if possible, as releases after 0.90.2 have seen many many
improvements.

--Alex

On Mon, Jun 9, 2014 at 4:15 AM, Rohit Jaiswal rohit.jaiswal@gmail.com
wrote:
Hello Everyone,
We lost data after restarting Elasticsearch
cluster. Restarting is a part of deploying our software stack.
                     We have a 20-node cluster running 0.90.2 and we
have Splunk configured to index ES logs.
                     Looking at the Splunk logs, we could find the
following error a day before the deployment (restart) -
            [cluster.action.shard     ] [Rictor] sending failed shard for [c0a71ddaa70b463a9a179c36c7fc26e3][2], node[nJvnclczRNaLbETunjlcWw], [R], s[STARTED], reason
            [Failed to perform [bulk/shard] on replica, message [RemoteTransportException; nested: ResponseHandlerFailureTransportException; nested: NullPointerException; ]]

            [cluster.action.shard     ] [Kiss] received shard failed for [c0a71ddaa70b463a9a179c36c7fc26e3][2], node[nJvnclczRNaLbETunjlcWw], [R], s[STARTED], reason

            [Failed to perform [bulk/shard] on replica, message [RemoteTransportException; nested: ResponseHandlerFailureTransportException; nested: NullPointerException; ]]



                      Further,* a day after the deploy,* we see the
same errors on another node -
            [cluster.action.shard     ] [Contrary] received shard failed for [a58f9413315048ecb0abea48f5f6aae7][1], node[3UbHwVCkQvO3XroIl-awPw], [R], s[STARTED], reason

            [Failed to perform [bulk/shard] on replica, message [RemoteTransportException; nested: ResponseHandlerFailureTransportException; nested: NullPointerException; ]]


         *Immediately next, the following error is seen*. This error is seen repeatedly on a couple of other nodes as well -

             failed to start shard


             [cluster.action.shard     ] [Copperhead] sending failed shard for [a58f9413315048ecb0abea48f5f6aae7][0], node[EuRzr3MLQiSS6lzTZJbiKw], [R], s[INITIALIZING],
             reason [Failed to start shard, message [RecoveryFailedException[[a58f9413315048ecb0abea48f5f6aae7][0]: Recovery failed from [Frank Castle][dlv2mPypQaOxLPQhHQ67Fw]

             [inet[/10.2.136.81:9300]] into [Copperhead][EuRzr3MLQiSS6lzTZJbiKw][inet[/10.3.207.55:9300]]]; nested: RemoteTransportException[[Frank Castle]
             [inet[/10.2.136.81:9300]][index/shard/recovery/startRecovery]]; nested: RecoveryEngineException[[a58f9413315048ecb0abea48f5f6aae7][0] Phase[2] Execution failed];

             nested: RemoteTransportException[[Copperhead][inet[/10.3.207.55:9300]][index/shard/recovery/translogOps]]; nested: InvalidAliasNameException[[a58f9413315048ecb0abea48f5f6aae7]
    Invalid alias name [fbf1e55418a2327d308e7632911f9bb8bfed58059dd7f1e4abd3467c5f8519c3], Unknown alias name was passed to alias Filter]; ]]
During this time, we could not access previously indexed documents.
I looked up the alias error, looks like it is related to Delete By Query wrongly persisted to translog · Issue #1198 · elastic/elasticsearch · GitHub (Delete By Query wrongly persisted to translog # 1198),
         but this should be fixed in ES 0.18.0 and, we are using 0.90.2, so why is ES encountering this issue?

         What do we need to do to set this right and get back lost data? Please help.
Thanks.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/00e54753-ab89-4f63-a39e-0931e8f7e2f0%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/00e54753-ab89-4f63-a39e-0931e8f7e2f0%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/2wUHvnd_lU4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGCwEM8yrprZNCpzNqOiDzaoFwqh6Dth23OSc1byZe81P7Ba9w%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAGCwEM8yrprZNCpzNqOiDzaoFwqh6Dth23OSc1byZe81P7Ba9w%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAP_rV8H8gaEf6hKY199j5h9i07%2Bj9o%2BSXsNxPe-7h9kPS-vxsA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

rohit_jaiswal · June 19, 2014, 3:13am

Hi Alexander,
We sent you the stack trace. Can you please enlighten us on
this?

Thanks,
Rohit

On Mon, Jun 16, 2014 at 10:25 AM, Rohit Jaiswal rohit.jaiswal@gmail.com
wrote:

Hi Alexander,
Thanks for your reply. We plan to upgrade in the
long run, however we need to fix the data loss problem on 0.90.2 in the
immediate term.

Here is the stack trace -

10:09:37.783 PM

[22:09:37,783][WARN ][indices.cluster ] [Storm]
[b7a76aa06cfd4048987d1117f3e0433a][0] failed to start shard
org.elasticsearch.indices.recovery.RecoveryFailedException:
[b7a76aa06cfd4048987d1117f3e0433a][0]: Recovery failed from [Jeffrey
Mace][_jjr5BYJQjO6QzzheyDmhw][inet[/10.4.35.200:9300]] into
[Storm][FiW6mbR5ThqqSii5Wc28lQ][inet[/10.4.40.95:9300]]
at
org.elasticsearch.indices.recovery.RecoveryTarget.doRecovery(RecoveryTarget.java:293)
at
org.elasticsearch.indices.recovery.RecoveryTarget.access$300(RecoveryTarget.java:62)
at
org.elasticsearch.indices.recovery.RecoveryTarget$2.run(RecoveryTarget.java:163)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: org.elasticsearch.transport.RemoteTransportException: [Jeffrey
Mace][inet[/10.4.35.200:9300]][index/shard/recovery/startRecovery]
Caused by: org.elasticsearch.index.engine.RecoveryEngineException:
[b7a76aa06cfd4048987d1117f3e0433a][0] Phase[2] Execution failed
at
org.elasticsearch.index.engine.robin.RobinEngine.recover(RobinEngine.java:1147)
at
org.elasticsearch.index.shard.service.InternalIndexShard.recover(InternalIndexShard.java:526)
at
org.elasticsearch.indices.recovery.RecoverySource.recover(RecoverySource.java:116)
at
org.elasticsearch.indices.recovery.RecoverySource.access$1600(RecoverySource.java:60)
at
org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:328)
at
org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:314)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:265)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: org.elasticsearch.transport.RemoteTransportException:
[Storm][inet[/10.4.40.95:9300]][index/shard/recovery/translogOps]
Caused by: org.elasticsearch.indices.InvalidAliasNameException:
[b7a76aa06cfd4048987d1117f3e0433a] Invalid alias name
[1a4077872e41c0634cee780c1e5fc263bdd5f14b15ac9239480547ab2d3601eb], Unknown
alias name was passed to alias Filter
at
org.elasticsearch.index.aliases.IndexAliasesService.aliasFilter(IndexAliasesService.java:99)
at
org.elasticsearch.index.shard.service.InternalIndexShard.prepareDeleteByQuery(InternalIndexShard.java:382)
at
org.elasticsearch.index.shard.service.InternalIndexShard.performRecoveryOperation(InternalIndexShard.java:628)
at
org.elasticsearch.indices.recovery.RecoveryTarget$TranslogOperationsRequestHandler.messageReceived(RecoveryTarget.java:447)
at
org.elasticsearch.indices.recovery.RecoveryTarget$TranslogOperationsRequestHandler.messageReceived(RecoveryTarget.java:416)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:265)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
[22:09:37,799][WARN ][cluster.action.shard ] [Storm] sending failed
shard for [b7a76aa06cfd4048987d1117f3e0433a][0],
node[FiW6mbR5ThqqSii5Wc28lQ], [R], s[INITIALIZING], reason [Failed to start
shard, message
[RecoveryFailedException[[b7a76aa06cfd4048987d1117f3e0433a][0]: Recovery
failed from [Jeffrey Mace][_jjr5BYJQjO6QzzheyDmhw][inet[/10.4.35.200:9300]]
into [Storm][FiW6mbR5ThqqSii5Wc28lQ][inet[/10.4.40.95:9300]]]; nested:
RemoteTransportException[[Jeffrey Mace][inet[/10.4.35.200:9300]][index/shard/recovery/startRecovery]];
nested: RecoveryEngineException[[b7a76aa06cfd4048987d1117f3e0433a][0]
Phase[2] Execution failed]; nested:
RemoteTransportException[[Storm][inet[/10.4.40.95:9300]][index/shard/recovery/translogOps]];
nested: InvalidAliasNameException[[b7a76aa06cfd4048987d1117f3e0433a]
Invalid alias name
[1a4077872e41c0634cee780c1e5fc263bdd5f14b15ac9239480547ab2d3601eb], Unknown
alias name was passed to alias Filter]; ]]
[22:09:38,025][WARN ][indices.cluster ] [Storm]
[b7a76aa06cfd4048987d1117f3e0433a][0] failed to start shard
org.elasticsearch.indices.recovery.RecoveryFailedException:
[b7a76aa06cfd4048987d1117f3e0433a][0]: Recovery failed from [Jeffrey
Mace][_jjr5BYJQjO6QzzheyDmhw][inet[/10.4.35.200:9300]] into
[Storm][FiW6mbR5ThqqSii5Wc28lQ][inet[/10.4.40.95:9300]]
at
org.elasticsearch.indices.recovery.RecoveryTarget.doRecovery(RecoveryTarget.java:293)
at
org.elasticsearch.indices.recovery.RecoveryTarget.access$300(RecoveryTarget.java:62)
at
org.elasticsearch.indices.recovery.RecoveryTarget$2.run(RecoveryTarget.java:163)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: org.elasticsearch.transport.RemoteTransportException: [Jeffrey
Mace][inet[/10.4.35.200:9300]][index/shard/recovery/startRecovery]
Caused by: org.elasticsearch.index.engine.RecoveryEngineException:
[b7a76aa06cfd4048987d1117f3e0433a][0] Phase[2] Execution failed
at
org.elasticsearch.index.engine.robin.RobinEngine.recover(RobinEngine.java:1147)
at
org.elasticsearch.index.shard.service.InternalIndexShard.recover(InternalIndexShard.java:526)
at
org.elasticsearch.indices.recovery.RecoverySource.recover(RecoverySource.java:116)
at
org.elasticsearch.indices.recovery.RecoverySource.access$1600(RecoverySource.java:60)
at
org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:328)
at
org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:314)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:265)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: org.elasticsearch.transport.RemoteTransportException:
[Storm][inet[/10.4.40.95:9300]][index/shard/recovery/translogOps]
Caused by: org.elasticsearch.indices.InvalidAliasNameException:
[b7a76aa06cfd4048987d1117f3e0433a] Invalid alias name
[1a4077872e41c0634cee780c1e5fc263bdd5f14b15ac9239480547ab2d3601eb], Unknown
alias name was passed to alias Filter
at
org.elasticsearch.index.aliases.IndexAliasesService.aliasFilter(IndexAliasesService.java:99)
at
org.elasticsearch.index.shard.service.InternalIndexShard.prepareDeleteByQuery(InternalIndexShard.java:382)
at
org.elasticsearch.index.shard.service.InternalIndexShard.performRecoveryOperation(InternalIndexShard.java:628)
at
org.elasticsearch.indices.recovery.RecoveryTarget$TranslogOperationsRequestHandler.messageReceived(RecoveryTarget.java:447)
at
org.elasticsearch.indices.recovery.RecoveryTarget$TranslogOperationsRequestHandler.messageReceived(RecoveryTarget.java:416)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:265)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)

[22:09:38,042][WARN ][cluster.action.shard ] [Storm] sending failed
shard for [b7a76aa06cfd4048987d1117f3e0433a][0],
node[FiW6mbR5ThqqSii5Wc28lQ], [R], s[INITIALIZING], reason [Failed to start
shard, message
[RecoveryFailedException[[b7a76aa06cfd4048987d1117f3e0433a][0]: Recovery
failed from [Jeffrey Mace][_jjr5BYJQjO6QzzheyDmhw][inet[/10.4.35.200:9300]]
into [Storm][FiW6mbR5ThqqSii5Wc28lQ][inet[/10.4.40.95:9300]]]; nested:
RemoteTransportException[[Jeffrey Mace][inet[/10.4.35.200:9300]][index/shard/recovery/startRecovery]];
nested: RecoveryEngineException[[b7a76aa06cfd4048987d1117f3e0433a][0]
Phase[2] Execution failed]; nested:
RemoteTransportException[[Storm][inet[/10.4.40.95:9300]][index/shard/recovery/translogOps]];
nested: InvalidAliasNameException[[b7a76aa06cfd4048987d1117f3e0433a]
Invalid alias name
[1a4077872e41c0634cee780c1e5fc263bdd5f14b15ac9239480547ab2d3601eb], Unknown
alias name was passed to alias Filter]; ]]

Let us know..

Thanks,
Rohit

On Mon, Jun 16, 2014 at 6:13 AM, Alexander Reelsen alr@spinscale.de
wrote:
Hey,

without stack traces it is pretty hard to see the actual problem, do you
have them around (on one node this exception has happened, so it should
have been logged into the elasticsearch logfile as well). Also, you should
really upgrade if possible, as releases after 0.90.2 have seen many many
improvements.

--Alex

On Mon, Jun 9, 2014 at 4:15 AM, Rohit Jaiswal rohit.jaiswal@gmail.com
wrote:
Hello Everyone,
We lost data after restarting Elasticsearch
cluster. Restarting is a part of deploying our software stack.
                     We have a 20-node cluster running 0.90.2 and we
have Splunk configured to index ES logs.
                     Looking at the Splunk logs, we could find the
following error a day before the deployment (restart) -
            [cluster.action.shard     ] [Rictor] sending failed shard for [c0a71ddaa70b463a9a179c36c7fc26e3][2], node[nJvnclczRNaLbETunjlcWw], [R], s[STARTED], reason

            [Failed to perform [bulk/shard] on replica, message [RemoteTransportException; nested: ResponseHandlerFailureTransportException; nested: NullPointerException; ]]


            [cluster.action.shard     ] [Kiss] received shard failed for [c0a71ddaa70b463a9a179c36c7fc26e3][2], node[nJvnclczRNaLbETunjlcWw], [R], s[STARTED], reason


            [Failed to perform [bulk/shard] on replica, message [RemoteTransportException; nested: ResponseHandlerFailureTransportException; nested: NullPointerException; ]]



                      Further,* a day after the deploy,* we see the
same errors on another node -
            [cluster.action.shard     ] [Contrary] received shard failed for [a58f9413315048ecb0abea48f5f6aae7][1], node[3UbHwVCkQvO3XroIl-awPw], [R], s[STARTED], reason


            [Failed to perform [bulk/shard] on replica, message [RemoteTransportException; nested: ResponseHandlerFailureTransportException; nested: NullPointerException; ]]



         *Immediately next, the following error is seen*. This error is seen repeatedly on a couple of other nodes as well -

             failed to start shard



             [cluster.action.shard     ] [Copperhead] sending failed shard for [a58f9413315048ecb0abea48f5f6aae7][0], node[EuRzr3MLQiSS6lzTZJbiKw], [R], s[INITIALIZING],
             reason [Failed to start shard, message [RecoveryFailedException[[a58f9413315048ecb0abea48f5f6aae7][0]: Recovery failed from [Frank Castle][dlv2mPypQaOxLPQhHQ67Fw]


             [inet[/10.2.136.81:9300]] into [Copperhead][EuRzr3MLQiSS6lzTZJbiKw][inet[/10.3.207.55:9300]]]; nested: RemoteTransportException[[Frank Castle]
             [inet[/10.2.136.81:9300]][index/shard/recovery/startRecovery]]; nested: RecoveryEngineException[[a58f9413315048ecb0abea48f5f6aae7][0] Phase[2] Execution failed];


             nested: RemoteTransportException[[Copperhead][inet[/10.3.207.55:9300]][index/shard/recovery/translogOps]]; nested: InvalidAliasNameException[[a58f9413315048ecb0abea48f5f6aae7]
    Invalid alias name [fbf1e55418a2327d308e7632911f9bb8bfed58059dd7f1e4abd3467c5f8519c3], Unknown alias name was passed to alias Filter]; ]]
During this time, we could not access previously indexed documents.
I looked up the alias error, looks like it is related to Delete By Query wrongly persisted to translog · Issue #1198 · elastic/elasticsearch · GitHub (Delete By Query wrongly persisted to translog # 1198),
         but this should be fixed in ES 0.18.0 and, we are using 0.90.2, so why is ES encountering this issue?

         What do we need to do to set this right and get back lost data? Please help.
Thanks.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/00e54753-ab89-4f63-a39e-0931e8f7e2f0%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/00e54753-ab89-4f63-a39e-0931e8f7e2f0%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/2wUHvnd_lU4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGCwEM8yrprZNCpzNqOiDzaoFwqh6Dth23OSc1byZe81P7Ba9w%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAGCwEM8yrprZNCpzNqOiDzaoFwqh6Dth23OSc1byZe81P7Ba9w%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAP_rV8FrhSb%2BuDQdb26t3WwUOykB1HEY0q0pkchtKb-6_hboMA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

spinscale · June 20, 2014, 7:02am

Hey,

the exception you showed, can possibly happen, when you remove an alias.
However you mentioned NullPointerException in your first post, which is not
contained in the stacktrace, so it seems, that one is still missing.

Also, please retry with a newer version of Elasticsearch.

--Alex

On Thu, Jun 19, 2014 at 5:13 AM, Rohit Jaiswal rohit.jaiswal@gmail.com
wrote:

Hi Alexander,
We sent you the stack trace. Can you please enlighten us on
this?

Thanks,
Rohit

On Mon, Jun 16, 2014 at 10:25 AM, Rohit Jaiswal rohit.jaiswal@gmail.com
wrote:
Hi Alexander,
Thanks for your reply. We plan to upgrade in the
long run, however we need to fix the data loss problem on 0.90.2 in the
immediate term.

Here is the stack trace -

10:09:37.783 PM

[22:09:37,783][WARN ][indices.cluster ] [Storm]
[b7a76aa06cfd4048987d1117f3e0433a][0] failed to start shard
org.elasticsearch.indices.recovery.RecoveryFailedException:
[b7a76aa06cfd4048987d1117f3e0433a][0]: Recovery failed from [Jeffrey
Mace][_jjr5BYJQjO6QzzheyDmhw][inet[/10.4.35.200:9300]] into
[Storm][FiW6mbR5ThqqSii5Wc28lQ][inet[/10.4.40.95:9300]]
at
org.elasticsearch.indices.recovery.RecoveryTarget.doRecovery(RecoveryTarget.java:293)
at
org.elasticsearch.indices.recovery.RecoveryTarget.access$300(RecoveryTarget.java:62)
at
org.elasticsearch.indices.recovery.RecoveryTarget$2.run(RecoveryTarget.java:163)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: org.elasticsearch.transport.RemoteTransportException: [Jeffrey
Mace][inet[/10.4.35.200:9300]][index/shard/recovery/startRecovery]
Caused by: org.elasticsearch.index.engine.RecoveryEngineException:
[b7a76aa06cfd4048987d1117f3e0433a][0] Phase[2] Execution failed
at
org.elasticsearch.index.engine.robin.RobinEngine.recover(RobinEngine.java:1147)
at
org.elasticsearch.index.shard.service.InternalIndexShard.recover(InternalIndexShard.java:526)
at
org.elasticsearch.indices.recovery.RecoverySource.recover(RecoverySource.java:116)
at
org.elasticsearch.indices.recovery.RecoverySource.access$1600(RecoverySource.java:60)
at
org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:328)
at
org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:314)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:265)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: org.elasticsearch.transport.RemoteTransportException:
[Storm][inet[/10.4.40.95:9300]][index/shard/recovery/translogOps]
Caused by: org.elasticsearch.indices.InvalidAliasNameException:
[b7a76aa06cfd4048987d1117f3e0433a] Invalid alias name
[1a4077872e41c0634cee780c1e5fc263bdd5f14b15ac9239480547ab2d3601eb], Unknown
alias name was passed to alias Filter
at
org.elasticsearch.index.aliases.IndexAliasesService.aliasFilter(IndexAliasesService.java:99)
at
org.elasticsearch.index.shard.service.InternalIndexShard.prepareDeleteByQuery(InternalIndexShard.java:382)
at
org.elasticsearch.index.shard.service.InternalIndexShard.performRecoveryOperation(InternalIndexShard.java:628)
at
org.elasticsearch.indices.recovery.RecoveryTarget$TranslogOperationsRequestHandler.messageReceived(RecoveryTarget.java:447)
at
org.elasticsearch.indices.recovery.RecoveryTarget$TranslogOperationsRequestHandler.messageReceived(RecoveryTarget.java:416)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:265)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
[22:09:37,799][WARN ][cluster.action.shard ] [Storm] sending failed
shard for [b7a76aa06cfd4048987d1117f3e0433a][0],
node[FiW6mbR5ThqqSii5Wc28lQ], [R], s[INITIALIZING], reason [Failed to start
shard, message
[RecoveryFailedException[[b7a76aa06cfd4048987d1117f3e0433a][0]: Recovery
failed from [Jeffrey Mace][_jjr5BYJQjO6QzzheyDmhw][inet[/10.4.35.200:9300]]
into [Storm][FiW6mbR5ThqqSii5Wc28lQ][inet[/10.4.40.95:9300]]]; nested:
RemoteTransportException[[Jeffrey Mace][inet[/10.4.35.200:9300]][index/shard/recovery/startRecovery]];
nested: RecoveryEngineException[[b7a76aa06cfd4048987d1117f3e0433a][0]
Phase[2] Execution failed]; nested:
RemoteTransportException[[Storm][inet[/10.4.40.95:9300]][index/shard/recovery/translogOps]];
nested: InvalidAliasNameException[[b7a76aa06cfd4048987d1117f3e0433a]
Invalid alias name
[1a4077872e41c0634cee780c1e5fc263bdd5f14b15ac9239480547ab2d3601eb], Unknown
alias name was passed to alias Filter]; ]]
[22:09:38,025][WARN ][indices.cluster ] [Storm]
[b7a76aa06cfd4048987d1117f3e0433a][0] failed to start shard
org.elasticsearch.indices.recovery.RecoveryFailedException:
[b7a76aa06cfd4048987d1117f3e0433a][0]: Recovery failed from [Jeffrey
Mace][_jjr5BYJQjO6QzzheyDmhw][inet[/10.4.35.200:9300]] into
[Storm][FiW6mbR5ThqqSii5Wc28lQ][inet[/10.4.40.95:9300]]
at
org.elasticsearch.indices.recovery.RecoveryTarget.doRecovery(RecoveryTarget.java:293)
at
org.elasticsearch.indices.recovery.RecoveryTarget.access$300(RecoveryTarget.java:62)
at
org.elasticsearch.indices.recovery.RecoveryTarget$2.run(RecoveryTarget.java:163)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: org.elasticsearch.transport.RemoteTransportException: [Jeffrey
Mace][inet[/10.4.35.200:9300]][index/shard/recovery/startRecovery]
Caused by: org.elasticsearch.index.engine.RecoveryEngineException:
[b7a76aa06cfd4048987d1117f3e0433a][0] Phase[2] Execution failed
at
org.elasticsearch.index.engine.robin.RobinEngine.recover(RobinEngine.java:1147)
at
org.elasticsearch.index.shard.service.InternalIndexShard.recover(InternalIndexShard.java:526)
at
org.elasticsearch.indices.recovery.RecoverySource.recover(RecoverySource.java:116)
at
org.elasticsearch.indices.recovery.RecoverySource.access$1600(RecoverySource.java:60)
at
org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:328)
at
org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:314)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:265)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: org.elasticsearch.transport.RemoteTransportException:
[Storm][inet[/10.4.40.95:9300]][index/shard/recovery/translogOps]
Caused by: org.elasticsearch.indices.InvalidAliasNameException:
[b7a76aa06cfd4048987d1117f3e0433a] Invalid alias name
[1a4077872e41c0634cee780c1e5fc263bdd5f14b15ac9239480547ab2d3601eb], Unknown
alias name was passed to alias Filter
at
org.elasticsearch.index.aliases.IndexAliasesService.aliasFilter(IndexAliasesService.java:99)
at
org.elasticsearch.index.shard.service.InternalIndexShard.prepareDeleteByQuery(InternalIndexShard.java:382)
at
org.elasticsearch.index.shard.service.InternalIndexShard.performRecoveryOperation(InternalIndexShard.java:628)
at
org.elasticsearch.indices.recovery.RecoveryTarget$TranslogOperationsRequestHandler.messageReceived(RecoveryTarget.java:447)
at
org.elasticsearch.indices.recovery.RecoveryTarget$TranslogOperationsRequestHandler.messageReceived(RecoveryTarget.java:416)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:265)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)

[22:09:38,042][WARN ][cluster.action.shard ] [Storm] sending failed
shard for [b7a76aa06cfd4048987d1117f3e0433a][0],
node[FiW6mbR5ThqqSii5Wc28lQ], [R], s[INITIALIZING], reason [Failed to start
shard, message
[RecoveryFailedException[[b7a76aa06cfd4048987d1117f3e0433a][0]: Recovery
failed from [Jeffrey Mace][_jjr5BYJQjO6QzzheyDmhw][inet[/10.4.35.200:9300]]
into [Storm][FiW6mbR5ThqqSii5Wc28lQ][inet[/10.4.40.95:9300]]]; nested:
RemoteTransportException[[Jeffrey Mace][inet[/10.4.35.200:9300]][index/shard/recovery/startRecovery]];
nested: RecoveryEngineException[[b7a76aa06cfd4048987d1117f3e0433a][0]
Phase[2] Execution failed]; nested:
RemoteTransportException[[Storm][inet[/10.4.40.95:9300]][index/shard/recovery/translogOps]];
nested: InvalidAliasNameException[[b7a76aa06cfd4048987d1117f3e0433a]
Invalid alias name
[1a4077872e41c0634cee780c1e5fc263bdd5f14b15ac9239480547ab2d3601eb], Unknown
alias name was passed to alias Filter]; ]]

Let us know..

Thanks,
Rohit

On Mon, Jun 16, 2014 at 6:13 AM, Alexander Reelsen alr@spinscale.de
wrote:
Hey,

without stack traces it is pretty hard to see the actual problem, do you
have them around (on one node this exception has happened, so it should
have been logged into the elasticsearch logfile as well). Also, you should
really upgrade if possible, as releases after 0.90.2 have seen many many
improvements.

--Alex

On Mon, Jun 9, 2014 at 4:15 AM, Rohit Jaiswal rohit.jaiswal@gmail.com
wrote:
Hello Everyone,
We lost data after restarting Elasticsearch
cluster. Restarting is a part of deploying our software stack.
                     We have a 20-node cluster running 0.90.2 and
we have Splunk configured to index ES logs.
                     Looking at the Splunk logs, we could find the
following error a day before the deployment (restart) -
            [cluster.action.shard     ] [Rictor] sending failed shard for [c0a71ddaa70b463a9a179c36c7fc26e3][2], node[nJvnclczRNaLbETunjlcWw], [R], s[STARTED], reason


            [Failed to perform [bulk/shard] on replica, message [RemoteTransportException; nested: ResponseHandlerFailureTransportException; nested: NullPointerException; ]]



            [cluster.action.shard     ] [Kiss] received shard failed for [c0a71ddaa70b463a9a179c36c7fc26e3][2], node[nJvnclczRNaLbETunjlcWw], [R], s[STARTED], reason



            [Failed to perform [bulk/shard] on replica, message [RemoteTransportException; nested: ResponseHandlerFailureTransportException; nested: NullPointerException; ]]



                      Further,* a day after the deploy,* we see
the same errors on another node -
            [cluster.action.shard     ] [Contrary] received shard failed for [a58f9413315048ecb0abea48f5f6aae7][1], node[3UbHwVCkQvO3XroIl-awPw], [R], s[STARTED], reason



            [Failed to perform [bulk/shard] on replica, message [RemoteTransportException; nested: ResponseHandlerFailureTransportException; nested: NullPointerException; ]]




         *Immediately next, the following error is seen*. This error is seen repeatedly on a couple of other nodes as well -

             failed to start shard




             [cluster.action.shard     ] [Copperhead] sending failed shard for [a58f9413315048ecb0abea48f5f6aae7][0], node[EuRzr3MLQiSS6lzTZJbiKw], [R], s[INITIALIZING],
             reason [Failed to start shard, message [RecoveryFailedException[[a58f9413315048ecb0abea48f5f6aae7][0]: Recovery failed from [Frank Castle][dlv2mPypQaOxLPQhHQ67Fw]



             [inet[/10.2.136.81:9300]] into [Copperhead][EuRzr3MLQiSS6lzTZJbiKw][inet[/10.3.207.55:9300]]]; nested: RemoteTransportException[[Frank Castle]
             [inet[/10.2.136.81:9300]][index/shard/recovery/startRecovery]]; nested: RecoveryEngineException[[a58f9413315048ecb0abea48f5f6aae7][0] Phase[2] Execution failed];



             nested: RemoteTransportException[[Copperhead][inet[/10.3.207.55:9300]][index/shard/recovery/translogOps]]; nested: InvalidAliasNameException[[a58f9413315048ecb0abea48f5f6aae7]
    Invalid alias name [fbf1e55418a2327d308e7632911f9bb8bfed58059dd7f1e4abd3467c5f8519c3], Unknown alias name was passed to alias Filter]; ]]
During this time, we could not access previously indexed documents.
I looked up the alias error, looks like it is related to Delete By Query wrongly persisted to translog · Issue #1198 · elastic/elasticsearch · GitHub (Delete By Query wrongly persisted to translog # 1198),
         but this should be fixed in ES 0.18.0 and, we are using 0.90.2, so why is ES encountering this issue?

         What do we need to do to set this right and get back lost data? Please help.
Thanks.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/00e54753-ab89-4f63-a39e-0931e8f7e2f0%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/00e54753-ab89-4f63-a39e-0931e8f7e2f0%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/2wUHvnd_lU4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGCwEM8yrprZNCpzNqOiDzaoFwqh6Dth23OSc1byZe81P7Ba9w%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAGCwEM8yrprZNCpzNqOiDzaoFwqh6Dth23OSc1byZe81P7Ba9w%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAP_rV8FrhSb%2BuDQdb26t3WwUOykB1HEY0q0pkchtKb-6_hboMA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAP_rV8FrhSb%2BuDQdb26t3WwUOykB1HEY0q0pkchtKb-6_hboMA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGCwEM-FS8RtP3AfR-cE3Ok33eDK6PtbEKyiPhSXOVLg00xKZQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

rohit_jaiswal · June 20, 2014, 1:52pm

Hi Alexander,
Here is the stack trace for the NullpointerException -

[23:24:38,929][DEBUG][action.bulk ] [Rasputin, Mikhail]
[17f85dcb67b64a13bfef2be74595087e][0], node[a-eZTR9XRiWq-o0QmsM2aA], [P],
s[STARTED]: Failed to execute
[org.elasticsearch.action.bulk.BulkShardRequest@22b11bbf]
java.lang.NullPointerException
at
org.elasticsearch.action.bulk.TransportBulkAction$2.onResponse(TransportBulkAction.java:247)
at
org.elasticsearch.action.bulk.TransportBulkAction$2.onResponse(TransportBulkAction.java:242)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performReplicas(TransportShardReplicationOperationAction.java:607)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:533)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:430)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
[23:24:38,940][DEBUG][action.bulk ] [Rasputin, Mikhail]
[17f85dcb67b64a13bfef2be74595087e][0], node[a-eZTR9XRiWq-o0QmsM2aA], [P],
s[STARTED]: Failed to execute
[org.elasticsearch.action.bulk.BulkShardRequest@768475c4]
java.lang.NullPointerException
at
org.elasticsearch.action.bulk.TransportBulkAction$2.onResponse(TransportBulkAction.java:247)
at
org.elasticsearch.action.bulk.TransportBulkAction$2.onResponse(TransportBulkAction.java:242)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performReplicas(TransportShardReplicationOperationAction.java:607)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:533)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:430)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)

Thanks,
Rohit

On Fri, Jun 20, 2014 at 12:02 AM, Alexander Reelsen alr@spinscale.de
wrote:

Hey,

the exception you showed, can possibly happen, when you remove an alias.
However you mentioned NullPointerException in your first post, which is not
contained in the stacktrace, so it seems, that one is still missing.

Also, please retry with a newer version of Elasticsearch.

--Alex

On Thu, Jun 19, 2014 at 5:13 AM, Rohit Jaiswal rohit.jaiswal@gmail.com
wrote:
Hi Alexander,
We sent you the stack trace. Can you please enlighten us
on this?

Thanks,
Rohit

On Mon, Jun 16, 2014 at 10:25 AM, Rohit Jaiswal rohit.jaiswal@gmail.com
wrote:
Hi Alexander,
Thanks for your reply. We plan to upgrade in the
long run, however we need to fix the data loss problem on 0.90.2 in the
immediate term.

Here is the stack trace -

10:09:37.783 PM

[22:09:37,783][WARN ][indices.cluster ] [Storm]
[b7a76aa06cfd4048987d1117f3e0433a][0] failed to start shard
org.elasticsearch.indices.recovery.RecoveryFailedException:
[b7a76aa06cfd4048987d1117f3e0433a][0]: Recovery failed from [Jeffrey
Mace][_jjr5BYJQjO6QzzheyDmhw][inet[/10.4.35.200:9300]] into
[Storm][FiW6mbR5ThqqSii5Wc28lQ][inet[/10.4.40.95:9300]]
at
org.elasticsearch.indices.recovery.RecoveryTarget.doRecovery(RecoveryTarget.java:293)
at
org.elasticsearch.indices.recovery.RecoveryTarget.access$300(RecoveryTarget.java:62)
at
org.elasticsearch.indices.recovery.RecoveryTarget$2.run(RecoveryTarget.java:163)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: org.elasticsearch.transport.RemoteTransportException:
[Jeffrey Mace][inet[/10.4.35.200:9300
]][index/shard/recovery/startRecovery]
Caused by: org.elasticsearch.index.engine.RecoveryEngineException:
[b7a76aa06cfd4048987d1117f3e0433a][0] Phase[2] Execution failed
at
org.elasticsearch.index.engine.robin.RobinEngine.recover(RobinEngine.java:1147)
at
org.elasticsearch.index.shard.service.InternalIndexShard.recover(InternalIndexShard.java:526)
at
org.elasticsearch.indices.recovery.RecoverySource.recover(RecoverySource.java:116)
at
org.elasticsearch.indices.recovery.RecoverySource.access$1600(RecoverySource.java:60)
at
org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:328)
at
org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:314)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:265)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: org.elasticsearch.transport.RemoteTransportException:
[Storm][inet[/10.4.40.95:9300]][index/shard/recovery/translogOps]
Caused by: org.elasticsearch.indices.InvalidAliasNameException:
[b7a76aa06cfd4048987d1117f3e0433a] Invalid alias name
[1a4077872e41c0634cee780c1e5fc263bdd5f14b15ac9239480547ab2d3601eb], Unknown
alias name was passed to alias Filter
at
org.elasticsearch.index.aliases.IndexAliasesService.aliasFilter(IndexAliasesService.java:99)
at
org.elasticsearch.index.shard.service.InternalIndexShard.prepareDeleteByQuery(InternalIndexShard.java:382)
at
org.elasticsearch.index.shard.service.InternalIndexShard.performRecoveryOperation(InternalIndexShard.java:628)
at
org.elasticsearch.indices.recovery.RecoveryTarget$TranslogOperationsRequestHandler.messageReceived(RecoveryTarget.java:447)
at
org.elasticsearch.indices.recovery.RecoveryTarget$TranslogOperationsRequestHandler.messageReceived(RecoveryTarget.java:416)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:265)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
[22:09:37,799][WARN ][cluster.action.shard ] [Storm] sending failed
shard for [b7a76aa06cfd4048987d1117f3e0433a][0],
node[FiW6mbR5ThqqSii5Wc28lQ], [R], s[INITIALIZING], reason [Failed to start
shard, message
[RecoveryFailedException[[b7a76aa06cfd4048987d1117f3e0433a][0]: Recovery
failed from [Jeffrey Mace][_jjr5BYJQjO6QzzheyDmhw][inet[/
10.4.35.200:9300]] into
[Storm][FiW6mbR5ThqqSii5Wc28lQ][inet[/10.4.40.95:9300]]]; nested:
RemoteTransportException[[Jeffrey Mace][inet[/10.4.35.200:9300]][index/shard/recovery/startRecovery]];
nested: RecoveryEngineException[[b7a76aa06cfd4048987d1117f3e0433a][0]
Phase[2] Execution failed]; nested:
RemoteTransportException[[Storm][inet[/10.4.40.95:9300]][index/shard/recovery/translogOps]];
nested: InvalidAliasNameException[[b7a76aa06cfd4048987d1117f3e0433a]
Invalid alias name
[1a4077872e41c0634cee780c1e5fc263bdd5f14b15ac9239480547ab2d3601eb], Unknown
alias name was passed to alias Filter]; ]]
[22:09:38,025][WARN ][indices.cluster ] [Storm]
[b7a76aa06cfd4048987d1117f3e0433a][0] failed to start shard
org.elasticsearch.indices.recovery.RecoveryFailedException:
[b7a76aa06cfd4048987d1117f3e0433a][0]: Recovery failed from [Jeffrey
Mace][_jjr5BYJQjO6QzzheyDmhw][inet[/10.4.35.200:9300]] into
[Storm][FiW6mbR5ThqqSii5Wc28lQ][inet[/10.4.40.95:9300]]
at
org.elasticsearch.indices.recovery.RecoveryTarget.doRecovery(RecoveryTarget.java:293)
at
org.elasticsearch.indices.recovery.RecoveryTarget.access$300(RecoveryTarget.java:62)
at
org.elasticsearch.indices.recovery.RecoveryTarget$2.run(RecoveryTarget.java:163)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: org.elasticsearch.transport.RemoteTransportException:
[Jeffrey Mace][inet[/10.4.35.200:9300
]][index/shard/recovery/startRecovery]
Caused by: org.elasticsearch.index.engine.RecoveryEngineException:
[b7a76aa06cfd4048987d1117f3e0433a][0] Phase[2] Execution failed
at
org.elasticsearch.index.engine.robin.RobinEngine.recover(RobinEngine.java:1147)
at
org.elasticsearch.index.shard.service.InternalIndexShard.recover(InternalIndexShard.java:526)
at
org.elasticsearch.indices.recovery.RecoverySource.recover(RecoverySource.java:116)
at
org.elasticsearch.indices.recovery.RecoverySource.access$1600(RecoverySource.java:60)
at
org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:328)
at
org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:314)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:265)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: org.elasticsearch.transport.RemoteTransportException:
[Storm][inet[/10.4.40.95:9300]][index/shard/recovery/translogOps]
Caused by: org.elasticsearch.indices.InvalidAliasNameException:
[b7a76aa06cfd4048987d1117f3e0433a] Invalid alias name
[1a4077872e41c0634cee780c1e5fc263bdd5f14b15ac9239480547ab2d3601eb], Unknown
alias name was passed to alias Filter
at
org.elasticsearch.index.aliases.IndexAliasesService.aliasFilter(IndexAliasesService.java:99)
at
org.elasticsearch.index.shard.service.InternalIndexShard.prepareDeleteByQuery(InternalIndexShard.java:382)
at
org.elasticsearch.index.shard.service.InternalIndexShard.performRecoveryOperation(InternalIndexShard.java:628)
at
org.elasticsearch.indices.recovery.RecoveryTarget$TranslogOperationsRequestHandler.messageReceived(RecoveryTarget.java:447)
at
org.elasticsearch.indices.recovery.RecoveryTarget$TranslogOperationsRequestHandler.messageReceived(RecoveryTarget.java:416)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:265)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)

[22:09:38,042][WARN ][cluster.action.shard ] [Storm] sending failed
shard for [b7a76aa06cfd4048987d1117f3e0433a][0],
node[FiW6mbR5ThqqSii5Wc28lQ], [R], s[INITIALIZING], reason [Failed to start
shard, message
[RecoveryFailedException[[b7a76aa06cfd4048987d1117f3e0433a][0]: Recovery
failed from [Jeffrey Mace][_jjr5BYJQjO6QzzheyDmhw][inet[/
10.4.35.200:9300]] into
[Storm][FiW6mbR5ThqqSii5Wc28lQ][inet[/10.4.40.95:9300]]]; nested:
RemoteTransportException[[Jeffrey Mace][inet[/10.4.35.200:9300]][index/shard/recovery/startRecovery]];
nested: RecoveryEngineException[[b7a76aa06cfd4048987d1117f3e0433a][0]
Phase[2] Execution failed]; nested:
RemoteTransportException[[Storm][inet[/10.4.40.95:9300]][index/shard/recovery/translogOps]];
nested: InvalidAliasNameException[[b7a76aa06cfd4048987d1117f3e0433a]
Invalid alias name
[1a4077872e41c0634cee780c1e5fc263bdd5f14b15ac9239480547ab2d3601eb], Unknown
alias name was passed to alias Filter]; ]]

Let us know..

Thanks,
Rohit

On Mon, Jun 16, 2014 at 6:13 AM, Alexander Reelsen alr@spinscale.de
wrote:
Hey,

without stack traces it is pretty hard to see the actual problem, do
you have them around (on one node this exception has happened, so it should
have been logged into the elasticsearch logfile as well). Also, you should
really upgrade if possible, as releases after 0.90.2 have seen many many
improvements.

--Alex

On Mon, Jun 9, 2014 at 4:15 AM, Rohit Jaiswal rohit.jaiswal@gmail.com
wrote:
Hello Everyone,
We lost data after restarting Elasticsearch
cluster. Restarting is a part of deploying our software stack.
                     We have a 20-node cluster running 0.90.2 and
we have Splunk configured to index ES logs.
                     Looking at the Splunk logs, we could find the
following error a day before the deployment (restart) -
            [cluster.action.shard     ] [Rictor] sending failed shard for [c0a71ddaa70b463a9a179c36c7fc26e3][2], node[nJvnclczRNaLbETunjlcWw], [R], s[STARTED], reason



            [Failed to perform [bulk/shard] on replica, message [RemoteTransportException; nested: ResponseHandlerFailureTransportException; nested: NullPointerException; ]]




            [cluster.action.shard     ] [Kiss] received shard failed for [c0a71ddaa70b463a9a179c36c7fc26e3][2], node[nJvnclczRNaLbETunjlcWw], [R], s[STARTED], reason




            [Failed to perform [bulk/shard] on replica, message [RemoteTransportException; nested: ResponseHandlerFailureTransportException; nested: NullPointerException; ]]



                      Further,* a day after the deploy,* we see
the same errors on another node -
            [cluster.action.shard     ] [Contrary] received shard failed for [a58f9413315048ecb0abea48f5f6aae7][1], node[3UbHwVCkQvO3XroIl-awPw], [R], s[STARTED], reason




            [Failed to perform [bulk/shard] on replica, message [RemoteTransportException; nested: ResponseHandlerFailureTransportException; nested: NullPointerException; ]]





         *Immediately next, the following error is seen*. This error is seen repeatedly on a couple of other nodes as well -

             failed to start shard





             [cluster.action.shard     ] [Copperhead] sending failed shard for [a58f9413315048ecb0abea48f5f6aae7][0], node[EuRzr3MLQiSS6lzTZJbiKw], [R], s[INITIALIZING],
             reason [Failed to start shard, message [RecoveryFailedException[[a58f9413315048ecb0abea48f5f6aae7][0]: Recovery failed from [Frank Castle][dlv2mPypQaOxLPQhHQ67Fw]




             [inet[/10.2.136.81:9300]] into [Copperhead][EuRzr3MLQiSS6lzTZJbiKw][inet[/10.3.207.55:9300]]]; nested: RemoteTransportException[[Frank Castle]
             [inet[/10.2.136.81:9300]][index/shard/recovery/startRecovery]]; nested: RecoveryEngineException[[a58f9413315048ecb0abea48f5f6aae7][0] Phase[2] Execution failed];




             nested: RemoteTransportException[[Copperhead][inet[/10.3.207.55:9300]][index/shard/recovery/translogOps]]; nested: InvalidAliasNameException[[a58f9413315048ecb0abea48f5f6aae7]
    Invalid alias name [fbf1e55418a2327d308e7632911f9bb8bfed58059dd7f1e4abd3467c5f8519c3], Unknown alias name was passed to alias Filter]; ]]
During this time, we could not access previously indexed documents.
I looked up the alias error, looks like it is related to Delete By Query wrongly persisted to translog · Issue #1198 · elastic/elasticsearch · GitHub (Delete By Query wrongly persisted to translog # 1198),
         but this should be fixed in ES 0.18.0 and, we are using 0.90.2, so why is ES encountering this issue?

         What do we need to do to set this right and get back lost data? Please help.
Thanks.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/00e54753-ab89-4f63-a39e-0931e8f7e2f0%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/00e54753-ab89-4f63-a39e-0931e8f7e2f0%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/2wUHvnd_lU4/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGCwEM8yrprZNCpzNqOiDzaoFwqh6Dth23OSc1byZe81P7Ba9w%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAGCwEM8yrprZNCpzNqOiDzaoFwqh6Dth23OSc1byZe81P7Ba9w%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAP_rV8FrhSb%2BuDQdb26t3WwUOykB1HEY0q0pkchtKb-6_hboMA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAP_rV8FrhSb%2BuDQdb26t3WwUOykB1HEY0q0pkchtKb-6_hboMA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/2wUHvnd_lU4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGCwEM-FS8RtP3AfR-cE3Ok33eDK6PtbEKyiPhSXOVLg00xKZQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAGCwEM-FS8RtP3AfR-cE3Ok33eDK6PtbEKyiPhSXOVLg00xKZQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAP_rV8EGxL7s58Rgq8c4YhkPboLt7%3Dqx6jb_H5qTwd%3Duqb_imA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

rohit_jaiswal · June 21, 2014, 4:51pm

Hi Alexander,
I sent you stack trace for both the problems.

The InvalidAliasNameException was supposedly fixed in version 0.18.0 -
Delete By Query wrongly persisted to translog · Issue #1198 · elastic/elasticsearch · GitHub. We still faced
it in 0.90.2.

Yes, we might have tried to execute a delete by query on an alias that
might not exist probably because it was deleted or lost due to a bad shard.
But with the fix , we should not have faced it. Does 0.90.2 have that fix?
Please suggest.

We were able to reproduce the other error - Failed to execute bulk
shard - NullPointerException by performing concurrent updates to an
alias's data by at least 4 worker threads at the same time.

Checking the cluster state shows that one replica shard was unassigned,
cluster health being Yellow.

How do we recover the bad shard in this case?

Using the search API - curl -XGET 'http://localhost:9200/
/document/_search?q=:
, showed that those documents were still accessible. We could not reproduce
data loss like what we faced in the production cluster after getting the
NullPointerException error.

Elasticsearch is aware of this issue in 0.90.2 -

github.com/elastic/elasticsearch

Null pointer exceptions when bulk updates max out their retry on conflict

opened 10:13AM - 06 Aug 13 UTC

closed 05:02PM - 06 Aug 13 UTC

bleskes

>bug v1.0.0.Beta1 v0.90.3

Reported on the mailing list: ``` [2013-08-04 21:48:54,972][WARN ][cluster.acti…on.shard ] [es003] sending failed shard for [samples][20], node[COpFaS8rRhSulgDPAq5xxg], [R], s[STARTED], reason [Failed to perform [bulk/shard] on replica, message [RemoteTransportException; nested: ResponseHandlerFailureTransportException; nested: NullPointerException; ]] [2013-08-04 21:48:55,028][WARN ][action.bulk ] [es003] Failed to perform bulk/shard on replica [samples][82] org.elasticsearch.transport.RemoteTransportException Caused by: org.elasticsearch.transport.ResponseHandlerFailureTransportException Caused by: java.lang.NullPointerException at org.elasticsearch.action.bulk.TransportBulkAction$2.onResponse(TransportBulkAction.java:247) at org.elasticsearch.action.bulk.TransportBulkAction$2.onResponse(TransportBulkAction.java:242) at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$4.finishIfPossible(TransportShardReplicationOperationAction.java:693) at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$4.handleResponse(TransportShardReplicationOperationAction.java:679) at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$4.handleResponse(TransportShardReplicationOperationAction.java:676) at org.elasticsearch.transport.netty.MessageChannelHandler.handleResponse(MessageChannelHandler.java:153) at org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:124) at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:296) at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462) at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443) at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303) at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559) at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:268) at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:255) at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88) at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109) at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312) at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90) at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:662) [2013-08-04 21:48:55,029][WARN ][cluster.action.shard ] [es003] sending failed shard for [samples][82], node[aUq8CNUeT_iEIkJ7rVw02w], [R], s[STARTED], reason [Failed to perform [bulk/shard] on replica, message [RemoteTransportException; nested: ResponseHandlerFailureTransportException; nested: NullPointerException; ]] ```

Thanks,
Rohit

On Fri, Jun 20, 2014 at 6:52 AM, Rohit Jaiswal rohit.jaiswal@gmail.com
wrote:

Hi Alexander,
Here is the stack trace for the NullpointerException -

[23:24:38,929][DEBUG][action.bulk ] [Rasputin, Mikhail]
[17f85dcb67b64a13bfef2be74595087e][0], node[a-eZTR9XRiWq-o0QmsM2aA], [P],
s[STARTED]: Failed to execute
[org.elasticsearch.action.bulk.BulkShardRequest@22b11bbf]
java.lang.NullPointerException
at
org.elasticsearch.action.bulk.TransportBulkAction$2.onResponse(TransportBulkAction.java:247)
at
org.elasticsearch.action.bulk.TransportBulkAction$2.onResponse(TransportBulkAction.java:242)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performReplicas(TransportShardReplicationOperationAction.java:607)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:533)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:430)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
[23:24:38,940][DEBUG][action.bulk ] [Rasputin, Mikhail]
[17f85dcb67b64a13bfef2be74595087e][0], node[a-eZTR9XRiWq-o0QmsM2aA], [P],
s[STARTED]: Failed to execute
[org.elasticsearch.action.bulk.BulkShardRequest@768475c4]
java.lang.NullPointerException
at
org.elasticsearch.action.bulk.TransportBulkAction$2.onResponse(TransportBulkAction.java:247)
at
org.elasticsearch.action.bulk.TransportBulkAction$2.onResponse(TransportBulkAction.java:242)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performReplicas(TransportShardReplicationOperationAction.java:607)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:533)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:430)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)

Thanks,
Rohit

On Fri, Jun 20, 2014 at 12:02 AM, Alexander Reelsen alr@spinscale.de
wrote:
Hey,

the exception you showed, can possibly happen, when you remove an alias.
However you mentioned NullPointerException in your first post, which is not
contained in the stacktrace, so it seems, that one is still missing.

Also, please retry with a newer version of Elasticsearch.

--Alex

On Thu, Jun 19, 2014 at 5:13 AM, Rohit Jaiswal rohit.jaiswal@gmail.com
wrote:
Hi Alexander,
We sent you the stack trace. Can you please enlighten us
on this?

Thanks,
Rohit

On Mon, Jun 16, 2014 at 10:25 AM, Rohit Jaiswal <rohit.jaiswal@gmail.com

wrote:
Hi Alexander,
Thanks for your reply. We plan to upgrade in
the long run, however we need to fix the data loss problem on 0.90.2 in the
immediate term.

Here is the stack trace -

10:09:37.783 PM

[22:09:37,783][WARN ][indices.cluster ] [Storm]
[b7a76aa06cfd4048987d1117f3e0433a][0] failed to start shard
org.elasticsearch.indices.recovery.RecoveryFailedException:
[b7a76aa06cfd4048987d1117f3e0433a][0]: Recovery failed from [Jeffrey
Mace][_jjr5BYJQjO6QzzheyDmhw][inet[/10.4.35.200:9300]] into
[Storm][FiW6mbR5ThqqSii5Wc28lQ][inet[/10.4.40.95:9300]]
at
org.elasticsearch.indices.recovery.RecoveryTarget.doRecovery(RecoveryTarget.java:293)
at
org.elasticsearch.indices.recovery.RecoveryTarget.access$300(RecoveryTarget.java:62)
at
org.elasticsearch.indices.recovery.RecoveryTarget$2.run(RecoveryTarget.java:163)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
at java.lang.Thread.run(Unknown Source)
Caused by: org.elasticsearch.transport.RemoteTransportException:
[Jeffrey Mace][inet[/10.4.35.200:9300
]][index/shard/recovery/startRecovery]
Caused by: org.elasticsearch.index.engine.RecoveryEngineException:
[b7a76aa06cfd4048987d1117f3e0433a][0] Phase[2] Execution failed
at
org.elasticsearch.index.engine.robin.RobinEngine.recover(RobinEngine.java:1147)
at
org.elasticsearch.index.shard.service.InternalIndexShard.recover(InternalIndexShard.java:526)
at
org.elasticsearch.indices.recovery.RecoverySource.recover(RecoverySource.java:116)
at
org.elasticsearch.indices.recovery.RecoverySource.access$1600(RecoverySource.java:60)
at
org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:328)
at
org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:314)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:265)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
at java.lang.Thread.run(Unknown Source)
Caused by: org.elasticsearch.transport.RemoteTransportException:
[Storm][inet[/10.4.40.95:9300]][index/shard/recovery/translogOps]
Caused by: org.elasticsearch.indices.InvalidAliasNameException:
[b7a76aa06cfd4048987d1117f3e0433a] Invalid alias name
[1a4077872e41c0634cee780c1e5fc263bdd5f14b15ac9239480547ab2d3601eb], Unknown
alias name was passed to alias Filter
at
org.elasticsearch.index.aliases.IndexAliasesService.aliasFilter(IndexAliasesService.java:99)
at
org.elasticsearch.index.shard.service.InternalIndexShard.prepareDeleteByQuery(InternalIndexShard.java:382)
at
org.elasticsearch.index.shard.service.InternalIndexShard.performRecoveryOperation(InternalIndexShard.java:628)
at
org.elasticsearch.indices.recovery.RecoveryTarget$TranslogOperationsRequestHandler.messageReceived(RecoveryTarget.java:447)
at
org.elasticsearch.indices.recovery.RecoveryTarget$TranslogOperationsRequestHandler.messageReceived(RecoveryTarget.java:416)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:265)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
at java.lang.Thread.run(Unknown Source)
[22:09:37,799][WARN ][cluster.action.shard ] [Storm] sending failed
shard for [b7a76aa06cfd4048987d1117f3e0433a][0],
node[FiW6mbR5ThqqSii5Wc28lQ], [R], s[INITIALIZING], reason [Failed to start
shard, message
[RecoveryFailedException[[b7a76aa06cfd4048987d1117f3e0433a][0]: Recovery
failed from [Jeffrey Mace][_jjr5BYJQjO6QzzheyDmhw][inet[/
10.4.35.200:9300]] into
[Storm][FiW6mbR5ThqqSii5Wc28lQ][inet[/10.4.40.95:9300]]]; nested:
RemoteTransportException[[Jeffrey Mace][inet[/10.4.35.200:9300]][index/shard/recovery/startRecovery]];
nested: RecoveryEngineException[[b7a76aa06cfd4048987d1117f3e0433a][0]
Phase[2] Execution failed]; nested:
RemoteTransportException[[Storm][inet[/10.4.40.95:9300]][index/shard/recovery/translogOps]];
nested: InvalidAliasNameException[[b7a76aa06cfd4048987d1117f3e0433a]
Invalid alias name
[1a4077872e41c0634cee780c1e5fc263bdd5f14b15ac9239480547ab2d3601eb], Unknown
alias name was passed to alias Filter]; ]]
[22:09:38,025][WARN ][indices.cluster ] [Storm]
[b7a76aa06cfd4048987d1117f3e0433a][0] failed to start shard
org.elasticsearch.indices.recovery.RecoveryFailedException:
[b7a76aa06cfd4048987d1117f3e0433a][0]: Recovery failed from [Jeffrey
Mace][_jjr5BYJQjO6QzzheyDmhw][inet[/10.4.35.200:9300]] into
[Storm][FiW6mbR5ThqqSii5Wc28lQ][inet[/10.4.40.95:9300]]
at
org.elasticsearch.indices.recovery.RecoveryTarget.doRecovery(RecoveryTarget.java:293)
at
org.elasticsearch.indices.recovery.RecoveryTarget.access$300(RecoveryTarget.java:62)
at
org.elasticsearch.indices.recovery.RecoveryTarget$2.run(RecoveryTarget.java:163)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
at java.lang.Thread.run(Unknown Source)
Caused by: org.elasticsearch.transport.RemoteTransportException:
[Jeffrey Mace][inet[/10.4.35.200:9300
]][index/shard/recovery/startRecovery]
Caused by: org.elasticsearch.index.engine.RecoveryEngineException:
[b7a76aa06cfd4048987d1117f3e0433a][0] Phase[2] Execution failed
at
org.elasticsearch.index.engine.robin.RobinEngine.recover(RobinEngine.java:1147)
at
org.elasticsearch.index.shard.service.InternalIndexShard.recover(InternalIndexShard.java:526)
at
org.elasticsearch.indices.recovery.RecoverySource.recover(RecoverySource.java:116)
at
org.elasticsearch.indices.recovery.RecoverySource.access$1600(RecoverySource.java:60)
at
org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:328)
at
org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:314)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:265)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
at java.lang.Thread.run(Unknown Source)
Caused by: org.elasticsearch.transport.RemoteTransportException:
[Storm][inet[/10.4.40.95:9300]][index/shard/recovery/translogOps]
Caused by: org.elasticsearch.indices.InvalidAliasNameException:
[b7a76aa06cfd4048987d1117f3e0433a] Invalid alias name
[1a4077872e41c0634cee780c1e5fc263bdd5f14b15ac9239480547ab2d3601eb], Unknown
alias name was passed to alias Filter
at
org.elasticsearch.index.aliases.IndexAliasesService.aliasFilter(IndexAliasesService.java:99)
at
org.elasticsearch.index.shard.service.InternalIndexShard.prepareDeleteByQuery(InternalIndexShard.java:382)
at
org.elasticsearch.index.shard.service.InternalIndexShard.performRecoveryOperation(InternalIndexShard.java:628)
at
org.elasticsearch.indices.recovery.RecoveryTarget$TranslogOperationsRequestHandler.messageReceived(RecoveryTarget.java:447)
at
org.elasticsearch.indices.recovery.RecoveryTarget$TranslogOperationsRequestHandler.messageReceived(RecoveryTarget.java:416)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:265)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
at java.lang.Thread.run(Unknown Source)

[22:09:38,042][WARN ][cluster.action.shard ] [Storm] sending failed
shard for [b7a76aa06cfd4048987d1117f3e0433a][0],
node[FiW6mbR5ThqqSii5Wc28lQ], [R], s[INITIALIZING], reason [Failed to start
shard, message
[RecoveryFailedException[[b7a76aa06cfd4048987d1117f3e0433a][0]: Recovery
failed from [Jeffrey Mace][_jjr5BYJQjO6QzzheyDmhw][inet[/
10.4.35.200:9300]] into
[Storm][FiW6mbR5ThqqSii5Wc28lQ][inet[/10.4.40.95:9300]]]; nested:
RemoteTransportException[[Jeffrey Mace][inet[/10.4.35.200:9300]][index/shard/recovery/startRecovery]];
nested: RecoveryEngineException[[b7a76aa06cfd4048987d1117f3e0433a][0]
Phase[2] Execution failed]; nested:
RemoteTransportException[[Storm][inet[/10.4.40.95:9300]][index/shard/recovery/translogOps]];
nested: InvalidAliasNameException[[b7a76aa06cfd4048987d1117f3e0433a]
Invalid alias name
[1a4077872e41c0634cee780c1e5fc263bdd5f14b15ac9239480547ab2d3601eb], Unknown
alias name was passed to alias Filter]; ]]

Let us know..

Thanks,
Rohit

On Mon, Jun 16, 2014 at 6:13 AM, Alexander Reelsen alr@spinscale.de
wrote:
Hey,

without stack traces it is pretty hard to see the actual problem, do
you have them around (on one node this exception has happened, so it should
have been logged into the elasticsearch logfile as well). Also, you should
really upgrade if possible, as releases after 0.90.2 have seen many many
improvements.

--Alex

On Mon, Jun 9, 2014 at 4:15 AM, Rohit Jaiswal <rohit.jaiswal@gmail.com

wrote:
Hello Everyone,
We lost data after restarting Elasticsearch
cluster. Restarting is a part of deploying our software stack.
                     We have a 20-node cluster running 0.90.2 and
we have Splunk configured to index ES logs.
                     Looking at the Splunk logs, we could find
the following error a day before the deployment (restart) -
            [cluster.action.shard     ] [Rictor] sending failed shard for [c0a71ddaa70b463a9a179c36c7fc26e3][2], node[nJvnclczRNaLbETunjlcWw], [R], s[STARTED], reason




            [Failed to perform [bulk/shard] on replica, message [RemoteTransportException; nested: ResponseHandlerFailureTransportException; nested: NullPointerException; ]]





            [cluster.action.shard     ] [Kiss] received shard failed for [c0a71ddaa70b463a9a179c36c7fc26e3][2], node[nJvnclczRNaLbETunjlcWw], [R], s[STARTED], reason





            [Failed to perform [bulk/shard] on replica, message [RemoteTransportException; nested: ResponseHandlerFailureTransportException; nested: NullPointerException; ]]



                      Further,* a day after the deploy,* we see
the same errors on another node -
            [cluster.action.shard     ] [Contrary] received shard failed for [a58f9413315048ecb0abea48f5f6aae7][1], node[3UbHwVCkQvO3XroIl-awPw], [R], s[STARTED], reason





            [Failed to perform [bulk/shard] on replica, message [RemoteTransportException; nested: ResponseHandlerFailureTransportException; nested: NullPointerException; ]]






         *Immediately next, the following error is seen*. This error is seen repeatedly on a couple of other nodes as well -

             failed to start shard






             [cluster.action.shard     ] [Copperhead] sending failed shard for [a58f9413315048ecb0abea48f5f6aae7][0], node[EuRzr3MLQiSS6lzTZJbiKw], [R], s[INITIALIZING],
             reason [Failed to start shard, message [RecoveryFailedException[[a58f9413315048ecb0abea48f5f6aae7][0]: Recovery failed from [Frank Castle][dlv2mPypQaOxLPQhHQ67Fw]





             [inet[/10.2.136.81:9300]] into [Copperhead][EuRzr3MLQiSS6lzTZJbiKw][inet[/10.3.207.55:9300]]]; nested: RemoteTransportException[[Frank Castle]
             [inet[/10.2.136.81:9300]][index/shard/recovery/startRecovery]]; nested: RecoveryEngineException[[a58f9413315048ecb0abea48f5f6aae7][0] Phase[2] Execution failed];





             nested: RemoteTransportException[[Copperhead][inet[/10.3.207.55:9300]][index/shard/recovery/translogOps]]; nested: InvalidAliasNameException[[a58f9413315048ecb0abea48f5f6aae7]
    Invalid alias name [fbf1e55418a2327d308e7632911f9bb8bfed58059dd7f1e4abd3467c5f8519c3], Unknown alias name was passed to alias Filter]; ]]
During this time, we could not access previously indexed documents.
I looked up the alias error, looks like it is related to Delete By Query wrongly persisted to translog · Issue #1198 · elastic/elasticsearch · GitHub (Delete By Query wrongly persisted to translog # 1198),
         but this should be fixed in ES 0.18.0 and, we are using 0.90.2, so why is ES encountering this issue?

         What do we need to do to set this right and get back lost data? Please help.
Thanks.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearch+unsubscribe@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/00e54753-ab89-4f63-a39e-0931e8f7e2f0%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/00e54753-ab89-4f63-a39e-0931e8f7e2f0%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/2wUHvnd_lU4/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGCwEM8yrprZNCpzNqOiDzaoFwqh6Dth23OSc1byZe81P7Ba9w%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAGCwEM8yrprZNCpzNqOiDzaoFwqh6Dth23OSc1byZe81P7Ba9w%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAP_rV8FrhSb%2BuDQdb26t3WwUOykB1HEY0q0pkchtKb-6_hboMA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAP_rV8FrhSb%2BuDQdb26t3WwUOykB1HEY0q0pkchtKb-6_hboMA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/2wUHvnd_lU4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGCwEM-FS8RtP3AfR-cE3Ok33eDK6PtbEKyiPhSXOVLg00xKZQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAGCwEM-FS8RtP3AfR-cE3Ok33eDK6PtbEKyiPhSXOVLg00xKZQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAP_rV8F1%2BJAu1AWQREwVY%2BJMvAdeXqz7ZZHp_ejMA7Su5dHA%2Bw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

rohit_jaiswal · July 5, 2014, 12:02am

Hi Alexander,
Any updates on this?

Thanks,
Rohit

On Sat, Jun 21, 2014 at 9:51 AM, Rohit Jaiswal rohit.jaiswal@gmail.com
wrote:

Hi Alexander,
I sent you stack trace for both the problems.

The InvalidAliasNameException was supposedly fixed in version 0.18.0 -
Delete By Query wrongly persisted to translog · Issue #1198 · elastic/elasticsearch · GitHub. We still
faced it in 0.90.2.

Yes, we might have tried to execute a delete by query on an alias that
might not exist probably because it was deleted or lost due to a bad shard.
But with the fix , we should not have faced it. Does 0.90.2 have that fix?
Please suggest.

We were able to reproduce the other error - Failed to execute bulk
shard - NullPointerException by performing concurrent updates to an
alias's data by at least 4 worker threads at the same time.

Checking the cluster state shows that one replica shard was unassigned,
cluster health being Yellow.

How do we recover the bad shard in this case?

Using the search API - curl -XGET 'http://localhost:9200/
/document/_search?q=:
, showed that those documents were still accessible. We could not
reproduce data loss like what we faced in the production cluster after
getting the NullPointerException error.

Elasticsearch is aware of this issue in 0.90.2 -
Null pointer exceptions when bulk updates max out their retry on conflict · Issue #3448 · elastic/elasticsearch · GitHub

Thanks,
Rohit

On Fri, Jun 20, 2014 at 6:52 AM, Rohit Jaiswal rohit.jaiswal@gmail.com
wrote:
Hi Alexander,
Here is the stack trace for the NullpointerException -

[23:24:38,929][DEBUG][action.bulk ] [Rasputin, Mikhail]
[17f85dcb67b64a13bfef2be74595087e][0], node[a-eZTR9XRiWq-o0QmsM2aA], [P],
s[STARTED]: Failed to execute
[org.elasticsearch.action.bulk.BulkShardRequest@22b11bbf]
java.lang.NullPointerException
at
org.elasticsearch.action.bulk.TransportBulkAction$2.onResponse(TransportBulkAction.java:247)
at
org.elasticsearch.action.bulk.TransportBulkAction$2.onResponse(TransportBulkAction.java:242)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performReplicas(TransportShardReplicationOperationAction.java:607)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:533)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:430)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
[23:24:38,940][DEBUG][action.bulk ] [Rasputin, Mikhail]
[17f85dcb67b64a13bfef2be74595087e][0], node[a-eZTR9XRiWq-o0QmsM2aA], [P],
s[STARTED]: Failed to execute
[org.elasticsearch.action.bulk.BulkShardRequest@768475c4]
java.lang.NullPointerException
at
org.elasticsearch.action.bulk.TransportBulkAction$2.onResponse(TransportBulkAction.java:247)
at
org.elasticsearch.action.bulk.TransportBulkAction$2.onResponse(TransportBulkAction.java:242)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performReplicas(TransportShardReplicationOperationAction.java:607)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:533)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:430)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)

Thanks,
Rohit

On Fri, Jun 20, 2014 at 12:02 AM, Alexander Reelsen alr@spinscale.de
wrote:
Hey,

the exception you showed, can possibly happen, when you remove an alias.
However you mentioned NullPointerException in your first post, which is not
contained in the stacktrace, so it seems, that one is still missing.

Also, please retry with a newer version of Elasticsearch.

--Alex

On Thu, Jun 19, 2014 at 5:13 AM, Rohit Jaiswal rohit.jaiswal@gmail.com
wrote:
Hi Alexander,
We sent you the stack trace. Can you please enlighten us
on this?

Thanks,
Rohit

On Mon, Jun 16, 2014 at 10:25 AM, Rohit Jaiswal <
rohit.jaiswal@gmail.com> wrote:
Hi Alexander,
Thanks for your reply. We plan to upgrade in
the long run, however we need to fix the data loss problem on 0.90.2 in the
immediate term.

Here is the stack trace -

10:09:37.783 PM

[22:09:37,783][WARN ][indices.cluster ] [Storm]
[b7a76aa06cfd4048987d1117f3e0433a][0] failed to start shard
org.elasticsearch.indices.recovery.RecoveryFailedException:
[b7a76aa06cfd4048987d1117f3e0433a][0]: Recovery failed from [Jeffrey
Mace][_jjr5BYJQjO6QzzheyDmhw][inet[/10.4.35.200:9300]] into
[Storm][FiW6mbR5ThqqSii5Wc28lQ][inet[/10.4.40.95:9300]]
at
org.elasticsearch.indices.recovery.RecoveryTarget.doRecovery(RecoveryTarget.java:293)
at
org.elasticsearch.indices.recovery.RecoveryTarget.access$300(RecoveryTarget.java:62)
at
org.elasticsearch.indices.recovery.RecoveryTarget$2.run(RecoveryTarget.java:163)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
at java.lang.Thread.run(Unknown Source)
Caused by: org.elasticsearch.transport.RemoteTransportException:
[Jeffrey Mace][inet[/10.4.35.200:9300
]][index/shard/recovery/startRecovery]
Caused by: org.elasticsearch.index.engine.RecoveryEngineException:
[b7a76aa06cfd4048987d1117f3e0433a][0] Phase[2] Execution failed
at
org.elasticsearch.index.engine.robin.RobinEngine.recover(RobinEngine.java:1147)
at
org.elasticsearch.index.shard.service.InternalIndexShard.recover(InternalIndexShard.java:526)
at
org.elasticsearch.indices.recovery.RecoverySource.recover(RecoverySource.java:116)
at
org.elasticsearch.indices.recovery.RecoverySource.access$1600(RecoverySource.java:60)
at
org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:328)
at
org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:314)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:265)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
at java.lang.Thread.run(Unknown Source)
Caused by: org.elasticsearch.transport.RemoteTransportException:
[Storm][inet[/10.4.40.95:9300]][index/shard/recovery/translogOps]
Caused by: org.elasticsearch.indices.InvalidAliasNameException:
[b7a76aa06cfd4048987d1117f3e0433a] Invalid alias name
[1a4077872e41c0634cee780c1e5fc263bdd5f14b15ac9239480547ab2d3601eb], Unknown
alias name was passed to alias Filter
at
org.elasticsearch.index.aliases.IndexAliasesService.aliasFilter(IndexAliasesService.java:99)
at
org.elasticsearch.index.shard.service.InternalIndexShard.prepareDeleteByQuery(InternalIndexShard.java:382)
at
org.elasticsearch.index.shard.service.InternalIndexShard.performRecoveryOperation(InternalIndexShard.java:628)
at
org.elasticsearch.indices.recovery.RecoveryTarget$TranslogOperationsRequestHandler.messageReceived(RecoveryTarget.java:447)
at
org.elasticsearch.indices.recovery.RecoveryTarget$TranslogOperationsRequestHandler.messageReceived(RecoveryTarget.java:416)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:265)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
at java.lang.Thread.run(Unknown Source)
[22:09:37,799][WARN ][cluster.action.shard ] [Storm] sending
failed shard for [b7a76aa06cfd4048987d1117f3e0433a][0],
node[FiW6mbR5ThqqSii5Wc28lQ], [R], s[INITIALIZING], reason [Failed to start
shard, message
[RecoveryFailedException[[b7a76aa06cfd4048987d1117f3e0433a][0]: Recovery
failed from [Jeffrey Mace][_jjr5BYJQjO6QzzheyDmhw][inet[/
10.4.35.200:9300]] into
[Storm][FiW6mbR5ThqqSii5Wc28lQ][inet[/10.4.40.95:9300]]]; nested:
RemoteTransportException[[Jeffrey Mace][inet[/10.4.35.200:9300]][index/shard/recovery/startRecovery]];
nested: RecoveryEngineException[[b7a76aa06cfd4048987d1117f3e0433a][0]
Phase[2] Execution failed]; nested:
RemoteTransportException[[Storm][inet[/10.4.40.95:9300]][index/shard/recovery/translogOps]];
nested: InvalidAliasNameException[[b7a76aa06cfd4048987d1117f3e0433a]
Invalid alias name
[1a4077872e41c0634cee780c1e5fc263bdd5f14b15ac9239480547ab2d3601eb], Unknown
alias name was passed to alias Filter]; ]]
[22:09:38,025][WARN ][indices.cluster ] [Storm]
[b7a76aa06cfd4048987d1117f3e0433a][0] failed to start shard
org.elasticsearch.indices.recovery.RecoveryFailedException:
[b7a76aa06cfd4048987d1117f3e0433a][0]: Recovery failed from [Jeffrey
Mace][_jjr5BYJQjO6QzzheyDmhw][inet[/10.4.35.200:9300]] into
[Storm][FiW6mbR5ThqqSii5Wc28lQ][inet[/10.4.40.95:9300]]
at
org.elasticsearch.indices.recovery.RecoveryTarget.doRecovery(RecoveryTarget.java:293)
at
org.elasticsearch.indices.recovery.RecoveryTarget.access$300(RecoveryTarget.java:62)
at
org.elasticsearch.indices.recovery.RecoveryTarget$2.run(RecoveryTarget.java:163)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
at java.lang.Thread.run(Unknown Source)
Caused by: org.elasticsearch.transport.RemoteTransportException:
[Jeffrey Mace][inet[/10.4.35.200:9300
]][index/shard/recovery/startRecovery]
Caused by: org.elasticsearch.index.engine.RecoveryEngineException:
[b7a76aa06cfd4048987d1117f3e0433a][0] Phase[2] Execution failed
at
org.elasticsearch.index.engine.robin.RobinEngine.recover(RobinEngine.java:1147)
at
org.elasticsearch.index.shard.service.InternalIndexShard.recover(InternalIndexShard.java:526)
at
org.elasticsearch.indices.recovery.RecoverySource.recover(RecoverySource.java:116)
at
org.elasticsearch.indices.recovery.RecoverySource.access$1600(RecoverySource.java:60)
at
org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:328)
at
org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:314)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:265)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
at java.lang.Thread.run(Unknown Source)
Caused by: org.elasticsearch.transport.RemoteTransportException:
[Storm][inet[/10.4.40.95:9300]][index/shard/recovery/translogOps]
Caused by: org.elasticsearch.indices.InvalidAliasNameException:
[b7a76aa06cfd4048987d1117f3e0433a] Invalid alias name
[1a4077872e41c0634cee780c1e5fc263bdd5f14b15ac9239480547ab2d3601eb], Unknown
alias name was passed to alias Filter
at
org.elasticsearch.index.aliases.IndexAliasesService.aliasFilter(IndexAliasesService.java:99)
at
org.elasticsearch.index.shard.service.InternalIndexShard.prepareDeleteByQuery(InternalIndexShard.java:382)
at
org.elasticsearch.index.shard.service.InternalIndexShard.performRecoveryOperation(InternalIndexShard.java:628)
at
org.elasticsearch.indices.recovery.RecoveryTarget$TranslogOperationsRequestHandler.messageReceived(RecoveryTarget.java:447)
at
org.elasticsearch.indices.recovery.RecoveryTarget$TranslogOperationsRequestHandler.messageReceived(RecoveryTarget.java:416)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:265)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
at java.lang.Thread.run(Unknown Source)

[22:09:38,042][WARN ][cluster.action.shard ] [Storm] sending
failed shard for [b7a76aa06cfd4048987d1117f3e0433a][0],
node[FiW6mbR5ThqqSii5Wc28lQ], [R], s[INITIALIZING], reason [Failed to start
shard, message
[RecoveryFailedException[[b7a76aa06cfd4048987d1117f3e0433a][0]: Recovery
failed from [Jeffrey Mace][_jjr5BYJQjO6QzzheyDmhw][inet[/
10.4.35.200:9300]] into
[Storm][FiW6mbR5ThqqSii5Wc28lQ][inet[/10.4.40.95:9300]]]; nested:
RemoteTransportException[[Jeffrey Mace][inet[/10.4.35.200:9300]][index/shard/recovery/startRecovery]];
nested: RecoveryEngineException[[b7a76aa06cfd4048987d1117f3e0433a][0]
Phase[2] Execution failed]; nested:
RemoteTransportException[[Storm][inet[/10.4.40.95:9300]][index/shard/recovery/translogOps]];
nested: InvalidAliasNameException[[b7a76aa06cfd4048987d1117f3e0433a]
Invalid alias name
[1a4077872e41c0634cee780c1e5fc263bdd5f14b15ac9239480547ab2d3601eb], Unknown
alias name was passed to alias Filter]; ]]

Let us know..

Thanks,
Rohit

On Mon, Jun 16, 2014 at 6:13 AM, Alexander Reelsen alr@spinscale.de
wrote:
Hey,

without stack traces it is pretty hard to see the actual problem, do
you have them around (on one node this exception has happened, so it should
have been logged into the elasticsearch logfile as well). Also, you should
really upgrade if possible, as releases after 0.90.2 have seen many many
improvements.

--Alex

On Mon, Jun 9, 2014 at 4:15 AM, Rohit Jaiswal <
rohit.jaiswal@gmail.com> wrote:
Hello Everyone,
We lost data after restarting Elasticsearch
cluster. Restarting is a part of deploying our software stack.
                     We have a 20-node cluster running 0.90.2
and we have Splunk configured to index ES logs.
                     Looking at the Splunk logs, we could find
the following error a day before the deployment (restart) -
            [cluster.action.shard     ] [Rictor] sending failed shard for [c0a71ddaa70b463a9a179c36c7fc26e3][2], node[nJvnclczRNaLbETunjlcWw], [R], s[STARTED], reason





            [Failed to perform [bulk/shard] on replica, message [RemoteTransportException; nested: ResponseHandlerFailureTransportException; nested: NullPointerException; ]]






            [cluster.action.shard     ] [Kiss] received shard failed for [c0a71ddaa70b463a9a179c36c7fc26e3][2], node[nJvnclczRNaLbETunjlcWw], [R], s[STARTED], reason






            [Failed to perform [bulk/shard] on replica, message [RemoteTransportException; nested: ResponseHandlerFailureTransportException; nested: NullPointerException; ]]



                      Further,* a day after the deploy,* we see
the same errors on another node -
            [cluster.action.shard     ] [Contrary] received shard failed for [a58f9413315048ecb0abea48f5f6aae7][1], node[3UbHwVCkQvO3XroIl-awPw], [R], s[STARTED], reason






            [Failed to perform [bulk/shard] on replica, message [RemoteTransportException; nested: ResponseHandlerFailureTransportException; nested: NullPointerException; ]]







         *Immediately next, the following error is seen*. This error is seen repeatedly on a couple of other nodes as well -

             failed to start shard







             [cluster.action.shard     ] [Copperhead] sending failed shard for [a58f9413315048ecb0abea48f5f6aae7][0], node[EuRzr3MLQiSS6lzTZJbiKw], [R], s[INITIALIZING],
             reason [Failed to start shard, message [RecoveryFailedException[[a58f9413315048ecb0abea48f5f6aae7][0]: Recovery failed from [Frank Castle][dlv2mPypQaOxLPQhHQ67Fw]






             [inet[/10.2.136.81:9300]] into [Copperhead][EuRzr3MLQiSS6lzTZJbiKw][inet[/10.3.207.55:9300]]]; nested: RemoteTransportException[[Frank Castle]
             [inet[/10.2.136.81:9300]][index/shard/recovery/startRecovery]]; nested: RecoveryEngineException[[a58f9413315048ecb0abea48f5f6aae7][0] Phase[2] Execution failed];






             nested: RemoteTransportException[[Copperhead][inet[/10.3.207.55:9300]][index/shard/recovery/translogOps]]; nested: InvalidAliasNameException[[a58f9413315048ecb0abea48f5f6aae7]
    Invalid alias name [fbf1e55418a2327d308e7632911f9bb8bfed58059dd7f1e4abd3467c5f8519c3], Unknown alias name was passed to alias Filter]; ]]
During this time, we could not access previously indexed documents.
I looked up the alias error, looks like it is related to Delete By Query wrongly persisted to translog · Issue #1198 · elastic/elasticsearch · GitHub (Delete By Query wrongly persisted to translog # 1198),
         but this should be fixed in ES 0.18.0 and, we are using 0.90.2, so why is ES encountering this issue?

         What do we need to do to set this right and get back lost data? Please help.
Thanks.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearch+unsubscribe@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/00e54753-ab89-4f63-a39e-0931e8f7e2f0%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/00e54753-ab89-4f63-a39e-0931e8f7e2f0%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to a topic in
the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/2wUHvnd_lU4/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGCwEM8yrprZNCpzNqOiDzaoFwqh6Dth23OSc1byZe81P7Ba9w%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAGCwEM8yrprZNCpzNqOiDzaoFwqh6Dth23OSc1byZe81P7Ba9w%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAP_rV8FrhSb%2BuDQdb26t3WwUOykB1HEY0q0pkchtKb-6_hboMA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAP_rV8FrhSb%2BuDQdb26t3WwUOykB1HEY0q0pkchtKb-6_hboMA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/2wUHvnd_lU4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGCwEM-FS8RtP3AfR-cE3Ok33eDK6PtbEKyiPhSXOVLg00xKZQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAGCwEM-FS8RtP3AfR-cE3Ok33eDK6PtbEKyiPhSXOVLg00xKZQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAP_rV8F28ZQAV8emdrv-h7W0_j3cyGNWACD6Y%2BV-XgbzcsS63A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Elasticsearch is not starting after the restart Elasticsearch	5	3270	July 5, 2017
Shard failing after a cluster restart Elasticsearch	1	963	July 5, 2017
Cluster Reroute Retry Failed Null Pointer Exception Elasticsearch	1	514	October 24, 2019
Data is lost after elasticsearch restart Elasticsearch	4	580	April 25, 2023
Index corruption on cluster restart Elasticsearch	3	1319	July 6, 2017

Losing data after Elasticsearch restart

Related topics