Getting Failed to execute phase [query], all shards failed

Hi All,

Am getting the following exception
Failed to execute phase [query], all shards failed
org.elasticsearch.action.search.SearchPhaseExecutionException: Failed to execute phase [query], all shards failed
at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.onFirstPhaseResult(TransportSearchTypeAction.java:238)
at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.start(TransportSearchTypeAction.java:161)
at org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction.doExecute(TransportSearchQueryThenFetchAction.java:62)
at org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction.doExecute(TransportSearchQueryThenFetchAction.java:52)
at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:75)
at org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:100)
at org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:43)
at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:75)
at org.elasticsearch.action.support.HandledTransportAction$TransportHandler.messageReceived(HandledTransportAction.java:63)
at org.elasticsearch.action.support.HandledTransportAction$TransportHandler.messageReceived(HandledTransportAction.java:51)
at org.elasticsearch.transport.netty.MessageChannelHandler.handleRequest(MessageChannelHandler.java:220)
at org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:114)
at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:296)

Can anyone help me out to handle this?????????....

Even am getting the following exception while starting the ES..

========================
translog corruption while reading from stream]; nested: ElasticsearchIllegalArgumentException[No version type match [114]]; ]]
[2015-08-12 04:29:04,872][WARN ][index.engine ] [Black Crow] [beta][3] failed to sync translog
[2015-08-12 04:29:04,928][WARN ][indices.cluster ] [Black Crow] [[beta][3]] marking and sending shard failed due to [failed recovery]
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException: [beta][3] failed to recover shard
at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:290)
at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:112)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: org.elasticsearch.index.translog.TranslogCorruptedException: translog corruption while reading from stream
at org.elasticsearch.index.translog.ChecksummedTranslogStream.read(ChecksummedTranslogStream.java:72)
at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:260)
... 4 more
Caused by: org.elasticsearch.ElasticsearchIllegalArgumentException: No version type match [115]
at org.elasticsearch.index.VersionType.fromValue(VersionType.java:307)
at org.elasticsearch.index.translog.Translog$Create.readFrom(Translog.java:376)
at org.elasticsearch.index.translog.ChecksummedTranslogStream.read(ChecksummedTranslogStream.java:68)
... 5 more

May I know the reason??????

Something has gone wrong and your translog is corrupted.
You'll have to find it for that shard and probably delete it.

Hi Mark,
Thanks for your reply....

So is this not because of huge data?

Here is the information regarding shards:

logstash-2014.12.09 4 p STARTED 11230 2.6mb 172.16.112.25 Shooting Star
logstash-2014.12.09 4 r UNASSIGNED
logstash-2014.12.09 0 p STARTED 11410 2.6mb 172.16.112.25 Shooting Star
logstash-2014.12.09 0 r UNASSIGNED
logstash-2014.12.09 3 p STARTED 11263 2.6mb 172.16.112.25 Shooting Star
logstash-2014.12.09 3 r UNASSIGNED
logstash-2014.12.09 1 p STARTED 11113 2.5mb 172.16.112.25 Shooting Star
logstash-2014.12.09 1 r UNASSIGNED
logstash-2014.12.09 2 p STARTED 11459 2.7mb 172.16.112.25 Shooting Star
logstash-2014.12.09 2 r UNASSIGNED
logstash-2014.12.08 2 p STARTED 8959 2.1mb 172.16.112.25 Shooting Star
logstash-2014.12.08 2 r UNASSIGNED
logstash-2014.12.08 0 p STARTED 8730 2.1mb 172.16.112.25 Shooting Star
logstash-2014.12.08 0 r UNASSIGNED
logstash-2014.12.08 3 p STARTED 8861 2.1mb 172.16.112.25 Shooting Star
logstash-2014.12.08 3 r UNASSIGNED
logstash-2014.12.08 1 p STARTED 8843 2.1mb 172.16.112.25 Shooting Star
logstash-2014.12.08 1 r UNASSIGNED
logstash-2014.12.08 4 p STARTED 8915 2.1mb 172.16.112.25 Shooting Star
logstash-2014.12.08 4 r UNASSIGNED
beta 4 p STARTED 31567980 3gb 172.16.112.25 Shooting Star
beta 4 r UNASSIGNED
beta 0 p STARTED 31493375 3gb 172.16.112.25 Shooting Star
beta 0 r UNASSIGNED
beta 3 p STARTED 31568715 3gb 172.16.112.25 Shooting Star
beta 3 r UNASSIGNED
beta 1 p STARTED 31567359 3gb 172.16.112.25 Shooting Star
beta 1 r UNASSIGNED
beta 2 p STARTED 31567649 3gb 172.16.112.25 Shooting Star
beta 2 r UNASSIGNED
prod 2 p STARTED 25318076 4.3gb 172.16.112.25 Shooting Star
prod 2 r UNASSIGNED
prod 0 p UNASSIGNED
prod 0 r UNASSIGNED
prod 3 p STARTED 25302767 4.3gb 172.16.112.25 Shooting Star
prod 3 r UNASSIGNED
prod 1 p STARTED 25316553 4.3gb 172.16.112.25 Shooting Star
prod 1 r UNASSIGNED
prod 4 p STARTED 25318895 4.3gb 172.16.112.25 Shooting Star
prod 4 r UNASSIGNED

  1. Where among them prod [0] shard is unassigned it is showing, Isn't it because of the more memory usage.??.
  2. If not can you guide me how to delete a shard data as you said????
  3. And does the elasticsearch manage the data to be delete, when it occupies a huge memory space????

It might be easier if you just reindex the data in beta, are you able to do that?
The logs don't seem to show why the prod shard is not being assigned though.

It also doesn't look like you have too much data, but then I don't know what heap you have.

You need to manage data in ES, it's up to you to figure out when things need to be deleted. Elasticsearch Curator does automate it to a degree, but you still need to make decisions.

As the corruption is happened in prod index , why do I have to re-index the beta data? And for re-indexing I don't have backup data..
Is there any way to delete only a particular shard data?
And My machine is having heap memory of '40GB'.....

Is there any way to fix these type of issues at starting stage?????

If you have ES_HEAP_SIZE (is java min and max) set to 40GG, that's too much. You should keep it to 30.5Gb or less.
There's no way to fix this at startup as something bad has happened and made the data corrupt. And if you delete the shard you lose the data in it.

NOTE - the following is done entirely at your own risk. If you don't have the data elsewhere, or a backup, there is a chance you will lose data. MAKE SURE YOU HAVE A BACKUP.

Figure out where your ES data directory is.
Stop Elasticsearch and find the directory holding shard 3 of the beta index and then move the translog file to somewhere else.
Then restart ES. That will at least clear your translog corruption issue.

As I said though, you will likely lose whatever data is in the translog.

May I know why that I have to do

As I have data corruption in prod [0] shard ????

And also where can I find the 'ES_HEAP_SIZE' to set?? And if I set the size to that extent, after my data exceeds that limit of 30.5gb what will happen ????

Well the log you posted said shard 3 of the beta index, hence my comment.

Elasticsearch doesn't store everything in memory, it just puts certain things in there as needed (that's a gross simplification, but it seems like you are still learning).
Setting the heap size will depend on how you installed Elasticsearch to begin with.

Yes Thanks Mart, the logs that I got next time having prod [0], so based on that I said,
Mark I just copied few shards in new index.
But when I see the mappings of that new index it is not showing any mappings..

These mappings I created for new index..

PUT facebook3
{
"mappings": {
"face": {"_ttl": {"enabled": true, "default": "3m"}}
}
}

Inorder to remove the data for every 3 mins...
And I copied an index called facebook shards in this new index by stopping elasticsearch, after i copied those shards I again started ES,
When I run the search query am getting records:
GET facebook3/_search

{
"_index": "facebook3",
"_type": "face",
"_id": "1",
"_score": 1,
"_source": {
"date": "2015-03-31 13:12:13",
"name": "santosh"
}
}
]

But when I have seen the mappings, no mappings created for type face. May I know the reason????

Hi Mark Thanks for your support, I finally took the backup and done the data deletion at specific time using _ttl ...
Resolved the issue... :smiley:

I'm also getting this log for my elasticsearch.

[LiveUsedcarES-03] All shards failed for phase: [query]
org.elasticsearch.transport.RemoteTransportException:
[LiveUsedcarES-03][inet[/174.18.11.233:9201]][indices:data/read/search[phase/query]]
Caused by: org.elasticsearch.search.SearchParseException:
[usedcar_index_1312][5]:
query[filtered(+is_premium:0)->BooleanFilter(+cache(price:[10000 TO
*]) +cache(is_gaadi:^@^@^@^A))],from[-1],size[-1]: Parse Failure
[Failed to parse source
[{"query":{"filtered":{"filter":{"bool":{"must":[{"range":{"price":{"gte":10000}}},{"term":{"is_gaadi":"1"}}]}},"query":{"bool":{"must":[{"term":{"is_premium":0}}]}}}},"aggs":{"max_price":{"max":{"field":"price"}},"min_price":{"min":{"field":"price"}},"max_kms_run":{"max":{"field":"kms_run"}},"min_kms_run":{"min":{"field":"kms_run"}},"max_registration_year":{"max":{"field":"registration_year"}},"min_registration_year":{"min":{"field":"registration_year"}}},"post_filter":{"bool":{"must":[{"terms":{"listing_dealer_info":"dealer"}},{"term":{"is_gaadi":"1"}}],"must_not":[{"ids":{"values":["1164950","1170049","1258563","1281903","1181971","1276341","1206654","1218051","1205663","1223747","1253233","1207243","1222722","1192937","1199283","1191936","1165433","1256900","1203704","1184709","1258558","1275804","1275817","1204879","1180492","1223042","1190552","1219339","1203808","1198643","1258602","1202618","1277314","1271670","1217135","1218234","1205634","1269051","1213430","1279729","1224704","1219012","1269616","1214183","1180846","1275610","1218055","1214480","1276472","1178313","1276451","1276484","1263642","1180872","1225433","1200295","1194396","1192859","1222724","1189660","1217582","1219916","1225050","1280365","1219404","1276196","1277769","1276462","1204706","1173916","1176627","1189071","1210921","1220188","1278145","1176883","1204881","1200495","1276326","1276439","1170984","1186233","1273457","1222999","1169833","1252857","1164109","1269541","1277609","1276424","1218608","1201991","1225191","1227126","1276354","1275559","1209934","1278163","1216924","1223968","1214033","1263647","1191590","1279307","1276288","1222131","1216899","1276315","1195211","1276404","1252944","1276189","1276429","1276481","1200773","1208118","1258716","1276422","1276280","1273343","1216098","1176302","1219127","1147758","1209064","1165347","1193310","1203248","1213901","1276302","1276441","1222882","1261852","1280929","1218288","1272117","1276327","1164785","1223513","1262280","1223800","1277209","1182312","1185943","1204050","1276485","1211192","1258437","1216261","1269825","1223735","1219442","1194434","1186332","1214215","1276471","1211119","1278938","1272139","1157684","1199855","1194580","1253830","1276227","1280720","1276447","1281696","1200926","1281006","1223729","1189593","1198742","1200176","1269816","1217401","1258261","1209061","1263651","1200040","1159183","1217894"]}}]}},"size":0,"sort":{"goodness_score":{"order":"desc"}},"from":0}]]

Can you please tell where i'm going wrong.

Please start your own thread :slight_smile: