Red Cluster Health - Unsure How to Fix

Hi all,

I have been trying to figure out a problem where my Kibana is not able to keep connections alive with the Elasticsearch instance, and I think it is because of red cluster/index health. When Kibana is running, I get the below messages spammed constantly in the Elasticsearch cluster logs on all nodes. This started the same day that one of the Elasticsearch agents crashed and has been continuing since.

[2023-02-24T08:34:56,661][WARN ][o.e.c.r.a.AllocationService] [NodeName] [.kibana-event-log-8.1.3-000010][0] marking unavailable shards as stale: [V_z_SX6bTbyL3xN_dUJXdA]
[2023-02-24T08:35:02,032][WARN ][o.e.c.r.a.AllocationService] [NodeName] [.ds-auditbeat-8.1.3-2023.01.28-000014][0] marking unavailable shards as stale: [y5ut-1ZZTVejQgoSiL5XZQ]
[2023-02-24T08:35:02,212][WARN ][r.suppressed ] [NodeName] path: /winlogbeat-,,logs-windows./_eql/search, params: {allow_no_indices=true, index=winlogbeat-,,logs-windows.} start
at [elasticsearch-8.1.3.jar:8.1.3]
at$1.onFailure( [elasticsearch-8.1.3.jar:8.1.3]
at [elasticsearch-8.1.3.jar:8.1.3]
at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun( [elasticsearch-8.1.3.jar:8.1.3]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun( [elasticsearch-8.1.3.jar:8.1.3]
at [elasticsearch-8.1.3.jar:8.1.3]
at java.util.concurrent.ThreadPoolExecutor.runWorker( [?:?]
at java.util.concurrent.ThreadPoolExecutor$ [?:?]
at [?:?]
Caused by: Search rejected due to missing shards [[.ds-winlogbeat-8.1.3-2023.01.28-000013][2]]. Consider using allow_partial_search_results setting to bypass this error.
at ~[elasticsearch-8.1.3.jar:8.1.3]
at ~[elasticsearch-8.1.3.jar:8.1.3]
at$1.doRun( ~[elasticsearch-8.1.3.jar:8.1.3]
at ~[elasticsearch-8.1.3.jar:8.1.3]
... 6 more

This is a massive block that I can provide the whole error for, if necessary:

[2023-02-24T08:32:30,904][ERROR][o.e.x.e.p.RestEqlSearchAction] [NodeName] failed to send failure response
java.lang.IllegalStateException: Channel is already closed
Suppressed: java.lang.IllegalArgumentException: reader id must be specified

I randomly was able to log in to Kibana once after starting it up, but before it lost connection again, and saw that the .ds-winlogbeat-8.1.3-2023.01.28-000013 index is at Red health. I have found plenty of information on how to fix red health issues, but here is my problem: I cannot keep connected in Kibana for more than a few seconds (if that), and I do not have an API key generated aside from the limited permissions for beat agents. My feeling is that the red health is the root cause, but I am stumped at this point and desparate for any suggestions.

You won't be able to work out the problem without getting responses from Elasticsearch APIs. If Kibana is having trouble talking to Elasticsearch then I suggest bypassing it and talking to Elasticsearch directly, using curl or similar. If security is enabled then you'll need to include credentials:

curl -u 'USERNAME:PASSWORD' https://host:port/...

If the cluster is in such bad health that it cannot even check your credentials (e.g. the .security index is missing) then you will need to set up a user in the file realm instead.

Thank you, David! I can CURL with my user just fine. I swear I tried that but probably fat-fingered something and moved on in my frustration. I will give it a shot from here.

For future Googlers, this came down to a corrupted transaction log.

Using this link, and this query, I found the below message in the "details" block:

GET _cluster/allocation/explain?filter_path=index,node_allocation_decisions.node_name,node_allocation_decisions.deciders.*

translog from source [name/path] is corrupted

Then using this link, I was able to target and repair the corrupted index:

Cluster health is back to green, Kibana is working.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.