[Elasticsearch cluster status red] : path: /.apm-agent-configuration/_search, params: {index=.apm-agent-configuration} org.elasticsearch.action.search.SearchPhaseExecutionException: all shards failed

abi.mc · April 15, 2021, 10:47am

Our two node elasticsearch cluster status shows red. The cluster is configured with HTTP and TLS encryption. Please find below the error from elasticsearch logs. Could you please let us know how the cluster status can be changed to green ?

# curl -k https://127.0.0.1:9200/_cluster/health?pretty -u username:XXXXXXXXX
{
  "cluster_name" : "em-elk-cluster",
  "status" : "red",
  "timed_out" : false,
  "number_of_nodes" : 2,
  "number_of_data_nodes" : 2,
  "active_primary_shards" : 116,
  "active_shards" : 232,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 6,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 97.47899159663865

Indices which are not green :

health status index                           uuid                   pri rep docs.count docs.deleted 
store.size pri.store.size
red    open   .apm-custom-link                RGR9AneMRx6nnQGHF0HqKg   1   1                                               
red    open   .async-search                   sA-j0eQBQf2po6HhP6AtMg   1   1                                               
red    open   .apm-agent-configuration        3R2woj_jRVCf58Q8b9--Yg   1   1

Error from elasticsearch logs :

[2021-04-15T00:17:01,301][DEBUG][o.e.a.s.TransportSearchAction] [elekpelk02] All shards failed for phase: [query]
[2021-04-15T00:17:01,303][WARN ][r.suppressed             ] [elekpelk02] path: /.apm-agent-configuration/_search, params: {index=.apm-agent-configuration}
org.elasticsearch.action.search.SearchPhaseExecutionException: all shards failed
    at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:551) [elasticsearch-7.8.0.jar:7.8.0]
    at org.elasticsearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:309) [elasticsearch-7.8.0.jar:7.8.0]
    at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:582) [elasticsearch-7.8.0.jar:7.8.0]
    at org.elasticsearch.action.search.AbstractSearchAsyncAction.onShardFailure(AbstractSearchAsyncAction.java:393) [elasticsearch-7.8.0.jar:7.8.0]
    at org.elasticsearch.action.search.AbstractSearchAsyncAction.lambda$performPhaseOnShard$0(AbstractSearchAsyncAction.java:223) [elasticsearch-7.8.0.jar:7.8.0]
    at org.elasticsearch.action.search.AbstractSearchAsyncAction$2.doRun(AbstractSearchAsyncAction.java:288) [elasticsearch-7.8.0.jar:7.8.0]
    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.8.0.jar:7.8.0]
    at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:44) [elasticsearch-7.8.0.jar:7.8.0]
    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:695) [elasticsearch-7.8.0.jar:7.8.0]
    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.8.0.jar:7.8.0]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]
    at java.lang.Thread.run(Thread.java:832) [?:?]
[2021-04-15T02:00:01,514][INFO ][o.e.c.m.MetadataCreateIndexService] [elekpelk02] [.monitoring-kibana-7-2021.04.15] creating index, cause [auto(bulk api)], templates [.monitoring-kibana], shards [1]/[0], mappings [_doc]
[2021-04-15T02:00:01,515][INFO ][o.e.c.r.a.AllocationService] [elekpelk02] updating number_of_replicas to [1] for indices [.monitoring-kibana-7-2021.04.15]
[2021-04-15T02:00:01,568][INFO ][o.e.c.m.MetadataDeleteIndexService] [elekpelk02] [logstash-2021.01.14/cedz7nZJRvKX-RxIaHO8sA] deleting index
[2021-04-15T02:00:02,288][INFO ][o.e.c.m.MetadataCreateIndexService] [elekpelk02] [.monitoring-es-7-2021.04.15] creating index, cause [auto(bulk api)], templates [.monitoring-es], shards [1]/[0], mappings [_doc]
[2021-04-15T02:00:02,289][INFO ][o.e.c.r.a.AllocationService] [elekpelk02] updating number_of_replicas to [1] for indices [.monitoring-es-7-2021.04.15]
[2021-04-15T02:00:02,650][INFO ][o.e.c.m.MetadataCreateIndexService] [elekpelk02] [logstash-2021.04.15] creating index, cause [auto(bulk api)], templates [], shards [1]/[1], mappings []
[2021-04-15T02:00:02,796][INFO ][o.e.c.m.MetadataMappingService] [elekpelk02] [logstash-2021.04.15/Yv_uuMA7QQuVzDAqGJLeSw] create_mapping [_doc]
[2021-04-15T02:00:02,907][INFO ][o.e.c.m.MetadataMappingService] [elekpelk02] [logstash-2021.04.15/Yv_uuMA7QQuVzDAqGJLeSw] update_mapping [_doc]
[2021-04-15T02:00:03,917][INFO ][o.e.c.m.MetadataMappingService] [elekpelk02] [logstash-2021.04.15/Yv_uuMA7QQuVzDAqGJLeSw] update_mapping [_doc]
[2021-04-15T02:00:06,422][INFO ][o.e.c.m.MetadataMappingService] [elekpelk02] [logstash-2021.04.15/Yv_uuMA7QQuVzDAqGJLeSw] update_mapping [_doc]
[2021-04-15T02:00:06,458][INFO ][o.e.c.m.MetadataMappingService] [elekpelk02] [logstash-2021.04.15/Yv_uuMA7QQuVzDAqGJLeSw] update_mapping [_doc]
[2021-04-15T02:00:06,461][INFO ][o.e.c.m.MetadataMappingService] [elekpelk02] [logstash-2021.04.15/Yv_uuMA7QQuVzDAqGJLeSw] update_mapping [_doc]
[2021-04-15T02:00:07,429][INFO ][o.e.c.m.MetadataMappingService] [elekpelk02] [logstash-2021.04.15/Yv_uuMA7QQuVzDAqGJLeSw] update_mapping [_doc]
[2021-04-15T02:00:09,436][INFO ][o.e.c.m.MetadataMappingService] [elekpelk02] [logstash-2021.04.15/Yv_uuMA7QQuVzDAqGJLeSw] update_mapping [_doc]
[2021-04-15T02:00:10,189][INFO ][o.e.c.m.MetadataMappingService] [elekpelk02] [logstash-2021.04.15/Yv_uuMA7QQuVzDAqGJLeSw] update_mapping [_doc]
[2021-04-15T02:00:12,951][INFO ][o.e.c.m.MetadataMappingService] [elekpelk02] [logstash-2021.04.15/Yv_uuMA7QQuVzDAqGJLeSw] update_mapping [_doc]
[2021-04-15T02:01:14,226][INFO ][o.e.c.m.MetadataMappingService] [elekpelk02] [logstash-2021.04.15/Yv_uuMA7QQuVzDAqGJLeSw] update_mapping [_doc]
[2021-04-15T02:02:00,000][INFO ][o.e.x.m.MlDailyMaintenanceService] [elekpelk02] triggering scheduled [ML] maintenance tasks
[2021-04-15T02:02:00,001][INFO ][o.e.x.m.a.TransportDeleteExpiredDataAction] [elekpelk02] Deleting expired data
[2021-04-15T02:02:00,003][INFO ][o.e.x.m.a.TransportDeleteExpiredDataAction] [elekpelk02] Completed deletion of expired ML data
[2021-04-15T02:02:00,003][INFO ][o.e.x.m.MlDailyMaintenanceService] [elekpelk02] Successfully completed [ML] maintenance tasks
[2021-04-15T02:02:28,385][INFO ][o.e.c.m.MetadataMappingService] [elekpelk02] [logstash-2021.04.15/Yv_uuMA7QQuVzDAqGJLeSw] update_mapping [_doc]
[2021-04-15T03:00:00,000][INFO ][o.e.x.m.e.l.LocalExporter] [elekpelk02] cleaning up [2] old indices
[2021-04-15T03:00:00,001][INFO ][o.e.c.m.MetadataDeleteIndexService] [elekpelk02] [.monitoring-es-7-2021.04.08/TfebMzzeRleRDRxJudVObA] deleting index
[2021-04-15T03:00:00,001][INFO ][o.e.c.m.MetadataDeleteIndexService] [elekpelk02] [.monitoring-kibana-7-2021.04.08/ZeEDjkCjR5iz0SweGcHNeg] deleting index
[2021-04-15T03:30:00,001][INFO ][o.e.x.s.SnapshotRetentionTask] [elekpelk02] starting SLM retention snapshot cleanup task
[2021-04-15T03:30:00,002][INFO ][o.e.x.s.SnapshotRetentionTask] [elekpelk02] there are no repositories to fetch, SLM retention snapshot cleanup task complete
[2021-04-15T12:17:01,385][DEBUG][o.e.a.s.TransportSearchAction] [elekpelk02] All shards failed for phase: [query]
[2021-04-15T12:17:01,386][WARN ][r.suppressed             ] [elekpelk02] path: /.apm-agent-configuration/_search, params: {index=.apm-agent-configuration}
org.elasticsearch.action.search.SearchPhaseExecutionException: all shards failed
    at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:551) [elasticsearch-7.8.0.jar:7.8.0]
    at org.elasticsearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:309) [elasticsearch-7.8.0.jar:7.8.0]
    at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:582) [elasticsearch-7.8.0.jar:7.8.0]
    at org.elasticsearch.action.search.AbstractSearchAsyncAction.onShardFailure(AbstractSearchAsyncAction.java:393) [elasticsearch-7.8.0.jar:7.8.0]
    at org.elasticsearch.action.search.AbstractSearchAsyncAction.lambda$performPhaseOnShard$0(AbstractSearchAsyncAction.java:223) [elasticsearch-7.8.0.jar:7.8.0]
    at org.elasticsearch.action.search.AbstractSearchAsyncAction$2.doRun(AbstractSearchAsyncAction.java:288) [elasticsearch-7.8.0.jar:7.8.0]
    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.8.0.jar:7.8.0]
    at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:44) [elasticsearch-7.8.0.jar:7.8.0]
    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:695) [elasticsearch-7.8.0.jar:7.8.0]
    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.8.0.jar:7.8.0]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]
    at java.lang.Thread.run(Thread.java:832) [?:?]

dadoonet · April 15, 2021, 11:05am

Are you using APM?

If not, I'd just drop the APM indices.
You can drop the async search index as well.

I can't tell why you ended up in such situation but:

you have only 2 nodes where we recommend having 3
you are not using a recent version of the stack. You should upgrade.

abi.mc · April 15, 2021, 12:49pm

@dadoonet : Thanks for your reply.
We are usinr ELK 7.8 and using only two nodes currently. We plan to add the third node as well shortly as per the recommended failover architecture.

system · May 13, 2021, 12:50pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Cluster Health status is Red Elasticsearch	4	1072	October 14, 2021
Getting Elasticsearch cluster status as RED Elasticsearch elastic-stack-monitoring , docker	17	2777	June 14, 2022
Kibana console showing Elastic Search as “Red” status & all shards failed error Elasticsearch	9	1448	July 18, 2019
Tweezer fixes to status-red don't work, may need sledgehammer Elasticsearch	7	538	July 6, 2017
Cluster ElasticSearch Red Elasticsearch	6	383	October 18, 2022

[Elasticsearch cluster status red] : path: /.apm-agent-configuration/_search, params: {index=.apm-agent-configuration} org.elasticsearch.action.search.SearchPhaseExecutionException: all shards failed

Related topics