The problem is in the operation of logstash.
In the old version 8.9 there were no problems with pipelines, but as soon as they updated to 8.10 on one node I started getting errors:
[logstash.outputs.elasticsearch][aws-pipe] Failed to perform request {:message=>"Connection pool shut down", :exception=>Manticore::ClientStoppedException, :cause=>#<Java::JavaLang::IllegalStateException: Connection pool shut down>}
Apr 18 11:39:53 logstash.my logstash[587165]: [2024-04-18T11:39:53,789][WARN ][logstash.outputs.elasticsearch][aws-pipe] Attempted to resurrect connection to dead ES instance, but got an error {:url=>"https://logstash_int:xxxxxx@ingest.my:9200/", :exception=>LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError, :message=>"Elasticsearch Unreachable: [https://ingest.my:9200/][Manticore::ClientStoppedException] Connection pool shut down"}
When I connect a new pipeline, similar errors start appearing, when I disable it they stop. Moreover, it doesn’t make sense what kind of pipeline it is - it’s just one list that worked well before. Now I’m very worried and afraid to update Logstash to a new version so that similar problems do not start on this node.
The only thing I noticed on ingests where all the data is transferred from logstash is that there is a very high load on the network interface.
no, I transferred part of the configuration to the old version node of logstash.
Now I’ll try to analyze the logs as a whole, maybe this is a cumulative problem.
[2024-04-18T09:58:52,770][WARN ][o.e.c.r.a.a.DesiredBalanceReconciler] [master02.my] [10.5%] of assigned shards (198/1873) are not on their desired nodes, which exceeds the warn threshold of [10%]
the master is defined on the web interface, but in the logstash logs I still see messages like this:
[ERROR][logstash.licensechecker.licensereader] Unable to retrieve license information from license server {:message=>"No Available connections"}
and
[ERROR][logstash.licensechecker.licensereader] Unable to retrieve Elasticsearch cluster info. {:message=>"No Available connections", :exception=>LogStash::Outputs::ElasticSearch::HttpClient::Pool::NoConnectionAvailableError}
and those previous messages.
Because of this, I am afraid that after the update everything will stop working for me, because... I don’t see any silent errors on the second duplicate node with the old version...
The master is trying to figure out why the master is unclear, but so far without success.
Not sure what you mean with this, your logstash errors are pretty on point, they mean that Logstash cannot connect to your Elasticsearch cluster, for Logstash to work you first need to solve your Elasticsearch issue.
Your Elasticsearch errors indicates that your cluster has not elected a master yet.
[SERVICE_UNAVAILABLE/2/no master]
So, from what you shared it seems that you are having issues with your Elasticsearch cluster.
What is the result when you run a curl to it? For example curl https://your-cluster:9200
My data goes to ingest and then to the database. And I noticed that the ingests are losing contact with the elastic cluster, but I can’t understand why this is happening...
I did a little research on the available logs and came to the conclusion that the problem is with the ingests, they are the ones who lose connection with the master, and the most interesting thing is that not all of them at the same time, but this happens one by one - first one loses connection, and then after a certain time another ingest cannot reach the master. Despite the fact that other elastic nodes do not lose masters and continue to work as normal.
Does anyone have any ideas on why this might be happening. I changed the tcp session timeout on all nodes to 3500 seconds, but this did not bring any results; data from lostash still periodically cannot get into elastic.
I will be glad to go to anyone, mine is already finished
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.