Red Cluster Health - Unsure How to Fix

bcantrell · February 24, 2023, 6:32pm

Hi all,

I have been trying to figure out a problem where my Kibana is not able to keep connections alive with the Elasticsearch instance, and I think it is because of red cluster/index health. When Kibana is running, I get the below messages spammed constantly in the Elasticsearch cluster logs on all nodes. This started the same day that one of the Elasticsearch agents crashed and has been continuing since.

[2023-02-24T08:34:56,661][WARN ][o.e.c.r.a.AllocationService] [NodeName] [.kibana-event-log-8.1.3-000010][0] marking unavailable shards as stale: [V_z_SX6bTbyL3xN_dUJXdA]
[2023-02-24T08:35:02,032][WARN ][o.e.c.r.a.AllocationService] [NodeName] [.ds-auditbeat-8.1.3-2023.01.28-000014][0] marking unavailable shards as stale: [y5ut-1ZZTVejQgoSiL5XZQ]
[2023-02-24T08:35:02,212][WARN ][r.suppressed ] [NodeName] path: /winlogbeat-,logs-endpoint.events.,logs-windows./_eql/search, params: {allow_no_indices=true, index=winlogbeat-,logs-endpoint.events.,logs-windows.}
org.elasticsearch.action.search.SearchPhaseExecutionException: start
at org.elasticsearch.action.search.CanMatchPreFilterSearchPhase.onPhaseFailure(CanMatchPreFilterSearchPhase.java:465) [elasticsearch-8.1.3.jar:8.1.3]
at org.elasticsearch.action.search.CanMatchPreFilterSearchPhase$1.onFailure(CanMatchPreFilterSearchPhase.java:454) [elasticsearch-8.1.3.jar:8.1.3]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:28) [elasticsearch-8.1.3.jar:8.1.3]
at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:33) [elasticsearch-8.1.3.jar:8.1.3]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:776) [elasticsearch-8.1.3.jar:8.1.3]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26) [elasticsearch-8.1.3.jar:8.1.3]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
at java.lang.Thread.run(Thread.java:833) [?:?]
Caused by: org.elasticsearch.action.search.SearchPhaseExecutionException: Search rejected due to missing shards [[.ds-winlogbeat-8.1.3-2023.01.28-000013][2]]. Consider using allow_partial_search_results setting to bypass this error.
at org.elasticsearch.action.search.CanMatchPreFilterSearchPhase.checkNoMissingShards(CanMatchPreFilterSearchPhase.java:216) ~[elasticsearch-8.1.3.jar:8.1.3]
at org.elasticsearch.action.search.CanMatchPreFilterSearchPhase.run(CanMatchPreFilterSearchPhase.java:140) ~[elasticsearch-8.1.3.jar:8.1.3]
at org.elasticsearch.action.search.CanMatchPreFilterSearchPhase$1.doRun(CanMatchPreFilterSearchPhase.java:459) ~[elasticsearch-8.1.3.jar:8.1.3]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26) ~[elasticsearch-8.1.3.jar:8.1.3]
... 6 more

This is a massive block that I can provide the whole error for, if necessary:

[2023-02-24T08:32:30,904][ERROR][o.e.x.e.p.RestEqlSearchAction] [NodeName] failed to send failure response
java.lang.IllegalStateException: Channel is already closed
...
Suppressed: java.lang.IllegalArgumentException: reader id must be specified

I randomly was able to log in to Kibana once after starting it up, but before it lost connection again, and saw that the .ds-winlogbeat-8.1.3-2023.01.28-000013 index is at Red health. I have found plenty of information on how to fix red health issues, but here is my problem: I cannot keep connected in Kibana for more than a few seconds (if that), and I do not have an API key generated aside from the limited permissions for beat agents. My feeling is that the red health is the root cause, but I am stumped at this point and desparate for any suggestions.

DavidTurner · February 25, 2023, 9:35am

You won't be able to work out the problem without getting responses from Elasticsearch APIs. If Kibana is having trouble talking to Elasticsearch then I suggest bypassing it and talking to Elasticsearch directly, using curl or similar. If security is enabled then you'll need to include credentials:

curl -u 'USERNAME:PASSWORD' https://host:port/...

If the cluster is in such bad health that it cannot even check your credentials (e.g. the .security index is missing) then you will need to set up a user in the file realm instead.

bcantrell · February 27, 2023, 1:49pm

Thank you, David! I can CURL with my user just fine. I swear I tried that but probably fat-fingered something and moved on in my frustration. I will give it a shot from here.

bcantrell · February 27, 2023, 10:08pm

For future Googlers, this came down to a corrupted transaction log.

Using this link, and this query, I found the below message in the "details" block:

GET _cluster/allocation/explain?filter_path=index,node_allocation_decisions.node_name,node_allocation_decisions.deciders.*

translog from source [name/path] is corrupted

Then using this link, I was able to target and repair the corrupted index:

Cluster health is back to green, Kibana is working.

system · March 27, 2023, 10:09pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Kibana Status Red due to Monitoring Plugin Elasticsearch	3	2046	March 30, 2017
Elasticsearch issue Elasticsearch	7	1052	January 7, 2021
Kibana intermittently losing connectivity to Elasticsearch Kibana	3	568	February 7, 2022
Https://discuss.elastic.co/c/elasticsearch Elasticsearch	4	1008	July 5, 2017
Cannot connect to the Elasticsearch cluster Kibana	3	3566	July 10, 2019

Red Cluster Health - Unsure How to Fix

Related topics