I have 4 networks (Client-Network-A, Client-Network-B, Client-Network-C, and Server-Network-ELK)
In Server-Network-ELK I have ES cluster with 6 master nodes, 1 ingest node, kibana and logstash
I run everything on 6.5.4 and it worked without issues.
I have upgraded Server-Network-ELK and Client-Network-A to 7.2.0 with full cluster restart. It works without issues, status logs below.
I use Elasticsearch output in beats
Then, I have upgraded Client-Network-A beats (MB, FB, APM) to 7.2.0
None of them is able to deliver beat data to ES cluster now.
All - MB, FB and APM return the same error.
Failed to publish events: Post https://xxx:443/_bulk: EOF
Full metricbeat log below.
What I found and what I tried:
- On ingest ES node side, I have Nginx reverse proxy, which handles security and https. I have tried to swith to plain http on 9200 port, and error changed to:
ERROR elastics earch/client.go:335 Failed to perform any bulk index operations: Post http:/ /xxx-ip-address:9200/_bulk: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Also tried to increase output timeout parameter to 360 - without luck. - In nginx logs, I see that it receives some get and head requests (obviously beat checks ES status, and ILM status). I don't see POST requests in logs
- Sometimes, right after start, a beat successfully sends 1 or 2 documents to ES, I see them in ES and Kibana, but no more documents come through (this is extremly strange)
- I have tried to switch to Logstash output, port 5044, without SSL. It show exactly same issue (I see some traffic on 5044 port on server side using tcpdump, but no document come through)
- I have tried to revert back to versions on 7.1.0, 6.8.0, 6.7.0, 6.5.4, 6.5.1 of Beats.
Error message sometimes changes, but in general problem remains the same. - I have tried to send POST _bulk requests from client machine using CURL - they come without issues
- I have checked all firewall settings on both sides. It allows all ports from client public IP. I have contacted hosting support and they confirmed this
- I also have Client-Network-B and Client-Network-C, not directly controlled by me. They use older beats 6.5.4. and 6.6.0 and I see that data comes to elastic without issues (this is very strange)
Please advise, what else should I check, really appreciate this.
Metricbeat log:
metricbeat-system_1_41dc70b3d357 | 2019-06-30T09:56:02.250Z INFO pipeline/output.go:95 Connecting to backoff(elasticsearch(https://xxx:443))
metricbeat-system_1_41dc70b3d357 | 2019-06-30T09:56:02.274Z INFO elasticsearch/client.go:735 Attempting to connect to Elasticsearch version 7.2.0
metricbeat-system_1_41dc70b3d357 | 2019-06-30T09:56:02.323Z INFO [index-management] idxmgmt/std.go:252 Auto ILM enable success.
metricbeat-system_1_41dc70b3d357 | 2019-06-30T09:56:02.332Z INFO [index-management.ilm] ilm/std.go:134 do not generate ilm policy: exists=true, overwrite=false
metricbeat-system_1_41dc70b3d357 | 2019-06-30T09:56:02.332Z INFO [index-management] idxmgmt/std.go:265 ILM policy successfully loaded.
metricbeat-system_1_41dc70b3d357 | 2019-06-30T09:56:02.332Z INFO [index-management] idxmgmt/std.go:394 Set setup.template.name to '{metricbeat-7.2.0 {now/d}-000001}' as ILM is enabled.
metricbeat-system_1_41dc70b3d357 | 2019-06-30T09:56:02.332Z INFO [index-management] idxmgmt/std.go:399 Set setup.template.pattern to 'metricbeat-7.2.0-*' as ILM is enabled.
metricbeat-system_1_41dc70b3d357 | 2019-06-30T09:56:02.332Z INFO [index-management] idxmgmt/std.go:433 Set settings.index.lifecycle.rollover_alias in template to {metricbeat-7.2.0 {now/d}-000001} as ILM is enabled.
metricbeat-system_1_41dc70b3d357 | 2019-06-30T09:56:02.332Z INFO [index-management] idxmgmt/std.go:437 Set settings.index.lifecycle.name in template to {metricbeat-7.2.0 {"policy":{"phases":{"hot":{"actions":{"rollover":{"max_age":"30d","max_size":"50gb"}}}}}}} as ILM is enabled.
metricbeat-system_1_41dc70b3d357 | 2019-06-30T09:56:02.358Z INFO template/load.go:88 Template metricbeat-7.2.0 already exists and will not be overwritten.
metricbeat-system_1_41dc70b3d357 | 2019-06-30T09:56:02.358Z INFO [index-management] idxmgmt/std.go:289 Loaded index template.
metricbeat-system_1_41dc70b3d357 | 2019-06-30T09:56:02.366Z INFO [index-management] idxmgmt/std.go:300 Write alias successfully generated.
metricbeat-system_1_41dc70b3d357 | 2019-06-30T09:56:02.366Z INFO pipeline/output.go:105 Connection to backoff(elasticsearch(https://xxx:443)) established
metricbeat-system_1_41dc70b3d357 | 2019-06-30T09:56:00.622Z ERROR elasticsearch/client.go:335 Failed to perform any bulk index operations: Post https://xxx:443/_bulk: EOF
metricbeat-system_1_41dc70b3d357 | 2019-06-30T09:56:02.250Z ERROR pipeline/output.go:121 Failed to publish events: Post https://xxx:443/_bulk: EOF
Metricbeat config:
name: test-server
metricbeat.modules:
- module: system
period: 1m
metricsets:
- cpu
- memory
- filesystem
cpu.metrics: [percentages, normalized_percentages]
output.elasticsearch:
hosts: ["xxx:443"]
protocol: https
username: "xxx"
password: "xxx"
Kibana status:
{
"name" : "xxx",
"cluster_name" : "xxx",
"cluster_uuid" : "xxx",
"version" : {
"number" : "7.2.0",
"build_flavor" : "default",
"build_type" : "docker",
"build_hash" : "508c38a",
"build_date" : "2019-06-20T15:54:18.811730Z",
"build_snapshot" : false,
"lucene_version" : "8.0.0",
"minimum_wire_compatibility_version" : "6.8.0",
"minimum_index_compatibility_version" : "6.0.0-beta1"
},
"tagline" : "You Know, for Search"
}
Cluster status:
{
"cluster_name" : "xxx",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 7,
"number_of_data_nodes" : 6,
"active_primary_shards" : 216,
"active_shards" : 432,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 100.0
}