[Elasticsearch cluster status red] : path: /.apm-agent-configuration/_search, params: {index=.apm-agent-configuration} org.elasticsearch.action.search.SearchPhaseExecutionException: all shards failed

Our two node elasticsearch cluster status shows red. The cluster is configured with HTTP and TLS encryption. Please find below the error from elasticsearch logs. Could you please let us know how the cluster status can be changed to green ?

# curl -k https://127.0.0.1:9200/_cluster/health?pretty -u username:XXXXXXXXX
{
  "cluster_name" : "em-elk-cluster",
  "status" : "red",
  "timed_out" : false,
  "number_of_nodes" : 2,
  "number_of_data_nodes" : 2,
  "active_primary_shards" : 116,
  "active_shards" : 232,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 6,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 97.47899159663865

Indices which are not green :

health status index                           uuid                   pri rep docs.count docs.deleted 
store.size pri.store.size
red    open   .apm-custom-link                RGR9AneMRx6nnQGHF0HqKg   1   1                                               
red    open   .async-search                   sA-j0eQBQf2po6HhP6AtMg   1   1                                               
red    open   .apm-agent-configuration        3R2woj_jRVCf58Q8b9--Yg   1   1

Error from elasticsearch logs :

[2021-04-15T00:17:01,301][DEBUG][o.e.a.s.TransportSearchAction] [elekpelk02] All shards failed for phase: [query]
[2021-04-15T00:17:01,303][WARN ][r.suppressed             ] [elekpelk02] path: /.apm-agent-configuration/_search, params: {index=.apm-agent-configuration}
org.elasticsearch.action.search.SearchPhaseExecutionException: all shards failed
    at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:551) [elasticsearch-7.8.0.jar:7.8.0]
    at org.elasticsearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:309) [elasticsearch-7.8.0.jar:7.8.0]
    at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:582) [elasticsearch-7.8.0.jar:7.8.0]
    at org.elasticsearch.action.search.AbstractSearchAsyncAction.onShardFailure(AbstractSearchAsyncAction.java:393) [elasticsearch-7.8.0.jar:7.8.0]
    at org.elasticsearch.action.search.AbstractSearchAsyncAction.lambda$performPhaseOnShard$0(AbstractSearchAsyncAction.java:223) [elasticsearch-7.8.0.jar:7.8.0]
    at org.elasticsearch.action.search.AbstractSearchAsyncAction$2.doRun(AbstractSearchAsyncAction.java:288) [elasticsearch-7.8.0.jar:7.8.0]
    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.8.0.jar:7.8.0]
    at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:44) [elasticsearch-7.8.0.jar:7.8.0]
    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:695) [elasticsearch-7.8.0.jar:7.8.0]
    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.8.0.jar:7.8.0]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]
    at java.lang.Thread.run(Thread.java:832) [?:?]
[2021-04-15T02:00:01,514][INFO ][o.e.c.m.MetadataCreateIndexService] [elekpelk02] [.monitoring-kibana-7-2021.04.15] creating index, cause [auto(bulk api)], templates [.monitoring-kibana], shards [1]/[0], mappings [_doc]
[2021-04-15T02:00:01,515][INFO ][o.e.c.r.a.AllocationService] [elekpelk02] updating number_of_replicas to [1] for indices [.monitoring-kibana-7-2021.04.15]
[2021-04-15T02:00:01,568][INFO ][o.e.c.m.MetadataDeleteIndexService] [elekpelk02] [logstash-2021.01.14/cedz7nZJRvKX-RxIaHO8sA] deleting index
[2021-04-15T02:00:02,288][INFO ][o.e.c.m.MetadataCreateIndexService] [elekpelk02] [.monitoring-es-7-2021.04.15] creating index, cause [auto(bulk api)], templates [.monitoring-es], shards [1]/[0], mappings [_doc]
[2021-04-15T02:00:02,289][INFO ][o.e.c.r.a.AllocationService] [elekpelk02] updating number_of_replicas to [1] for indices [.monitoring-es-7-2021.04.15]
[2021-04-15T02:00:02,650][INFO ][o.e.c.m.MetadataCreateIndexService] [elekpelk02] [logstash-2021.04.15] creating index, cause [auto(bulk api)], templates [], shards [1]/[1], mappings []
[2021-04-15T02:00:02,796][INFO ][o.e.c.m.MetadataMappingService] [elekpelk02] [logstash-2021.04.15/Yv_uuMA7QQuVzDAqGJLeSw] create_mapping [_doc]
[2021-04-15T02:00:02,907][INFO ][o.e.c.m.MetadataMappingService] [elekpelk02] [logstash-2021.04.15/Yv_uuMA7QQuVzDAqGJLeSw] update_mapping [_doc]
[2021-04-15T02:00:03,917][INFO ][o.e.c.m.MetadataMappingService] [elekpelk02] [logstash-2021.04.15/Yv_uuMA7QQuVzDAqGJLeSw] update_mapping [_doc]
[2021-04-15T02:00:06,422][INFO ][o.e.c.m.MetadataMappingService] [elekpelk02] [logstash-2021.04.15/Yv_uuMA7QQuVzDAqGJLeSw] update_mapping [_doc]
[2021-04-15T02:00:06,458][INFO ][o.e.c.m.MetadataMappingService] [elekpelk02] [logstash-2021.04.15/Yv_uuMA7QQuVzDAqGJLeSw] update_mapping [_doc]
[2021-04-15T02:00:06,461][INFO ][o.e.c.m.MetadataMappingService] [elekpelk02] [logstash-2021.04.15/Yv_uuMA7QQuVzDAqGJLeSw] update_mapping [_doc]
[2021-04-15T02:00:07,429][INFO ][o.e.c.m.MetadataMappingService] [elekpelk02] [logstash-2021.04.15/Yv_uuMA7QQuVzDAqGJLeSw] update_mapping [_doc]
[2021-04-15T02:00:09,436][INFO ][o.e.c.m.MetadataMappingService] [elekpelk02] [logstash-2021.04.15/Yv_uuMA7QQuVzDAqGJLeSw] update_mapping [_doc]
[2021-04-15T02:00:10,189][INFO ][o.e.c.m.MetadataMappingService] [elekpelk02] [logstash-2021.04.15/Yv_uuMA7QQuVzDAqGJLeSw] update_mapping [_doc]
[2021-04-15T02:00:12,951][INFO ][o.e.c.m.MetadataMappingService] [elekpelk02] [logstash-2021.04.15/Yv_uuMA7QQuVzDAqGJLeSw] update_mapping [_doc]
[2021-04-15T02:01:14,226][INFO ][o.e.c.m.MetadataMappingService] [elekpelk02] [logstash-2021.04.15/Yv_uuMA7QQuVzDAqGJLeSw] update_mapping [_doc]
[2021-04-15T02:02:00,000][INFO ][o.e.x.m.MlDailyMaintenanceService] [elekpelk02] triggering scheduled [ML] maintenance tasks
[2021-04-15T02:02:00,001][INFO ][o.e.x.m.a.TransportDeleteExpiredDataAction] [elekpelk02] Deleting expired data
[2021-04-15T02:02:00,003][INFO ][o.e.x.m.a.TransportDeleteExpiredDataAction] [elekpelk02] Completed deletion of expired ML data
[2021-04-15T02:02:00,003][INFO ][o.e.x.m.MlDailyMaintenanceService] [elekpelk02] Successfully completed [ML] maintenance tasks
[2021-04-15T02:02:28,385][INFO ][o.e.c.m.MetadataMappingService] [elekpelk02] [logstash-2021.04.15/Yv_uuMA7QQuVzDAqGJLeSw] update_mapping [_doc]
[2021-04-15T03:00:00,000][INFO ][o.e.x.m.e.l.LocalExporter] [elekpelk02] cleaning up [2] old indices
[2021-04-15T03:00:00,001][INFO ][o.e.c.m.MetadataDeleteIndexService] [elekpelk02] [.monitoring-es-7-2021.04.08/TfebMzzeRleRDRxJudVObA] deleting index
[2021-04-15T03:00:00,001][INFO ][o.e.c.m.MetadataDeleteIndexService] [elekpelk02] [.monitoring-kibana-7-2021.04.08/ZeEDjkCjR5iz0SweGcHNeg] deleting index
[2021-04-15T03:30:00,001][INFO ][o.e.x.s.SnapshotRetentionTask] [elekpelk02] starting SLM retention snapshot cleanup task
[2021-04-15T03:30:00,002][INFO ][o.e.x.s.SnapshotRetentionTask] [elekpelk02] there are no repositories to fetch, SLM retention snapshot cleanup task complete
[2021-04-15T12:17:01,385][DEBUG][o.e.a.s.TransportSearchAction] [elekpelk02] All shards failed for phase: [query]
[2021-04-15T12:17:01,386][WARN ][r.suppressed             ] [elekpelk02] path: /.apm-agent-configuration/_search, params: {index=.apm-agent-configuration}
org.elasticsearch.action.search.SearchPhaseExecutionException: all shards failed
    at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:551) [elasticsearch-7.8.0.jar:7.8.0]
    at org.elasticsearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:309) [elasticsearch-7.8.0.jar:7.8.0]
    at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:582) [elasticsearch-7.8.0.jar:7.8.0]
    at org.elasticsearch.action.search.AbstractSearchAsyncAction.onShardFailure(AbstractSearchAsyncAction.java:393) [elasticsearch-7.8.0.jar:7.8.0]
    at org.elasticsearch.action.search.AbstractSearchAsyncAction.lambda$performPhaseOnShard$0(AbstractSearchAsyncAction.java:223) [elasticsearch-7.8.0.jar:7.8.0]
    at org.elasticsearch.action.search.AbstractSearchAsyncAction$2.doRun(AbstractSearchAsyncAction.java:288) [elasticsearch-7.8.0.jar:7.8.0]
    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.8.0.jar:7.8.0]
    at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:44) [elasticsearch-7.8.0.jar:7.8.0]
    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:695) [elasticsearch-7.8.0.jar:7.8.0]
    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.8.0.jar:7.8.0]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]
    at java.lang.Thread.run(Thread.java:832) [?:?]

Are you using APM?

If not, I'd just drop the APM indices.
You can drop the async search index as well.

I can't tell why you ended up in such situation but:

  • you have only 2 nodes where we recommend having 3
  • you are not using a recent version of the stack. You should upgrade.

@dadoonet : Thanks for your reply.
We are usinr ELK 7.8 and using only two nodes currently. We plan to add the third node as well shortly as per the recommended failover architecture.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.