Hello all,
All servers are running ELK versions 8.1.3 (I know, upgrade is in the works), on Windows 2016.
This error started getting spammed on 2023-02-17 after hours in the kibana log file and I cannot figure out how to resolve.
{"ecs":{"version":"8.0.0"},"@timestamp":"2023-02-17T17:25:51.925-05:00","message":"Task alerting:siem.eqlRule "91d93d22-ca15-11ec-81f3-83adeec329ee" failed: TimeoutError: Request timed out","log":{"level":"ERROR","logger":"plugins.taskManager"},"process":{"pid":2948},"trace":{"id":"ee52f812cd8766040b6df56d01fa8ca2"},"transaction":{"id":"c9f28ff986503730"}}
{"ecs":{"version":"8.0.0"},"@timestamp":"2023-02-17T17:25:55.929-05:00","message":"Unable to retrieve version information from Elasticsearch nodes. There are no living connections","log":{"level":"ERROR","logger":"elasticsearch-service"},"process":{"pid":2948},"trace":{"id":"b143915c0467af47bec8b191785ed79f"},"transaction":{"id":"e8a0da6b70464a4e"}}
{"ecs":{"version":"8.0.0"},"@timestamp":"2023-02-17T17:25:55.947-05:00","message":"error writing bulk events: "There are no living connections"
I think the original cause was due to the master Elasticsearch node crashing/faulting/restarting. Another master was elected and took over shortly after, but Kibana has had issues since. To my knowledge, there have not been any changes made in months. No software updates or reboots happened the day of the issue.
Elasticsearch appears to be functioning fine on all nodes. I have confirmed by browsing to https://[IP Address]:9200 and received the expected response from all:
"name" : "[Host Name]",
"cluster_name" : "[Cluster Name]",
"cluster_uuid" : "[Cluster UUID]",
"version" : {
"number" : "8.1.3",
"build_flavor" : "default",
"build_type" : "zip",
"build_hash" : "39afaa3c0fe7db4869a161985e240bd7182d7a07",
"build_date" : "2022-04-19T08:13:25.444693396Z",
"build_snapshot" : false,
"lucene_version" : "9.0.0",
"minimum_wire_compatibility_version" : "7.17.0",
"minimum_index_compatibility_version" : "7.0.0"
},
"tagline" : "You Know, for Search"
When Kibana is running, the Elasticsearch logs are spammed with these two messages repeatedly, neither of which were seen prior to the beginning of this issue. To me this suggests that there is some level of communication happening:
[WARN ][r.suppressed ] [Host Name] path: /winlogbeat-,logs-endpoint.events.,logs-windows./_eql/search, params: {allow_no_indices=true, index=winlogbeat-,logs-endpoint.events.,logs-windows.}
...
Caused by: org.elasticsearch.action.search.SearchPhaseExecutionException: Search rejected due to missing shards [[.ds-winlogbeat-8.1.3-2023.01.28-000013][2]]. Consider usingallow_partial_search_results
setting to bypass this error.
And this, which is a large error block:
[ERROR][o.e.x.e.p.RestEqlSearchAction] [Host Name] failed to send failure response
java.lang.IllegalStateException: Channel is already closed
My Kibana.yml:
server.port: 5601
server.host: "[Server IP, not 0.0.0.0]"
server.publicBaseUrl: "https://[Server Name]"
server.name: "[Server Name]"
server.ssl.certificate: [Path].pem
server.ssl.key: [Path].key
server.ssl.enabled: true
elasticsearch.hosts: ['https://[Host1]:9200','https://[Host2]:9200','https://[Host3]:9200']
elasticsearch.serviceAccountToken: [Token]
elasticsearch.ssl.certificateAuthorities: ['[Path].crt']
xpack.fleet.outputs: [{id: fleet-default-output, name: default, is_default: true, is_default_monitoring: true, type: elasticsearch, hosts: ['https://[Host1]:9200','https://[Host2]:9200','https://[Host3]:9200'], ca_trusted_fingerprint: [fingerprint]}]
Elasticsearch.yml:
xpack.security.enabled: true
xpack.security.enrollment.enabled: true
xpack.security.http.ssl:
enabled: true
keystore.path: certs/http.p12
xpack.security.transport.ssl:
enabled: true
verification_mode: certificate
keystore.path: certs/transport.p12
truststore.path: certs/transport.p12
cluster.initial_master_nodes: ["Server Name"]
http.host: [_local_, _site_]
transport.host: [_local_, _site_]
Any thoughts on what I can do to recover?