Beats error: POST _bulk EOF after upgrade from 6.5.4 to 7.2.0

I have 4 networks (Client-Network-A, Client-Network-B, Client-Network-C, and Server-Network-ELK)
In Server-Network-ELK I have ES cluster with 6 master nodes, 1 ingest node, kibana and logstash
I run everything on 6.5.4 and it worked without issues.
I have upgraded Server-Network-ELK and Client-Network-A to 7.2.0 with full cluster restart. It works without issues, status logs below.
I use Elasticsearch output in beats

Then, I have upgraded Client-Network-A beats (MB, FB, APM) to 7.2.0

None of them is able to deliver beat data to ES cluster now.
All - MB, FB and APM return the same error.
Failed to publish events: Post https://xxx:443/_bulk: EOF
Full metricbeat log below.

What I found and what I tried:

  1. On ingest ES node side, I have Nginx reverse proxy, which handles security and https. I have tried to swith to plain http on 9200 port, and error changed to:
    ERROR elastics earch/client.go:335 Failed to perform any bulk index operations: Post http:/ /xxx-ip-address:9200/_bulk: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
    Also tried to increase output timeout parameter to 360 - without luck.
  2. In nginx logs, I see that it receives some get and head requests (obviously beat checks ES status, and ILM status). I don't see POST requests in logs
  3. Sometimes, right after start, a beat successfully sends 1 or 2 documents to ES, I see them in ES and Kibana, but no more documents come through (this is extremly strange)
  4. I have tried to switch to Logstash output, port 5044, without SSL. It show exactly same issue (I see some traffic on 5044 port on server side using tcpdump, but no document come through)
  5. I have tried to revert back to versions on 7.1.0, 6.8.0, 6.7.0, 6.5.4, 6.5.1 of Beats.
    Error message sometimes changes, but in general problem remains the same.
  6. I have tried to send POST _bulk requests from client machine using CURL - they come without issues
  7. I have checked all firewall settings on both sides. It allows all ports from client public IP. I have contacted hosting support and they confirmed this
  8. I also have Client-Network-B and Client-Network-C, not directly controlled by me. They use older beats 6.5.4. and 6.6.0 and I see that data comes to elastic without issues (this is very strange)

Please advise, what else should I check, really appreciate this.

Metricbeat log:

metricbeat-system_1_41dc70b3d357 | 2019-06-30T09:56:02.250Z     INFO    pipeline/output.go:95        Connecting to backoff(elasticsearch(https://xxx:443))
metricbeat-system_1_41dc70b3d357 | 2019-06-30T09:56:02.274Z     INFO    elasticsearch/client.go:735  Attempting to connect to Elasticsearch version 7.2.0
metricbeat-system_1_41dc70b3d357 | 2019-06-30T09:56:02.323Z     INFO    [index-management]   idxmgmt/std.go:252      Auto ILM enable success.
metricbeat-system_1_41dc70b3d357 | 2019-06-30T09:56:02.332Z     INFO    [index-management.ilm]       ilm/std.go:134  do not generate ilm policy: exists=true, overwrite=false
metricbeat-system_1_41dc70b3d357 | 2019-06-30T09:56:02.332Z     INFO    [index-management]   idxmgmt/std.go:265      ILM policy successfully loaded.
metricbeat-system_1_41dc70b3d357 | 2019-06-30T09:56:02.332Z     INFO    [index-management]   idxmgmt/std.go:394      Set to '{metricbeat-7.2.0 {now/d}-000001}' as ILM is enabled.
metricbeat-system_1_41dc70b3d357 | 2019-06-30T09:56:02.332Z     INFO    [index-management]   idxmgmt/std.go:399      Set setup.template.pattern to 'metricbeat-7.2.0-*' as ILM is enabled.
metricbeat-system_1_41dc70b3d357 | 2019-06-30T09:56:02.332Z     INFO    [index-management]   idxmgmt/std.go:433      Set settings.index.lifecycle.rollover_alias in template to {metricbeat-7.2.0 {now/d}-000001} as ILM is enabled.
metricbeat-system_1_41dc70b3d357 | 2019-06-30T09:56:02.332Z     INFO    [index-management]   idxmgmt/std.go:437      Set in template to {metricbeat-7.2.0 {"policy":{"phases":{"hot":{"actions":{"rollover":{"max_age":"30d","max_size":"50gb"}}}}}}} as ILM is enabled.
metricbeat-system_1_41dc70b3d357 | 2019-06-30T09:56:02.358Z     INFO    template/load.go:88  Template metricbeat-7.2.0 already exists and will not be overwritten.
metricbeat-system_1_41dc70b3d357 | 2019-06-30T09:56:02.358Z     INFO    [index-management]   idxmgmt/std.go:289      Loaded index template.
metricbeat-system_1_41dc70b3d357 | 2019-06-30T09:56:02.366Z     INFO    [index-management]   idxmgmt/std.go:300      Write alias successfully generated.
metricbeat-system_1_41dc70b3d357 | 2019-06-30T09:56:02.366Z     INFO    pipeline/output.go:105       Connection to backoff(elasticsearch(https://xxx:443)) established
metricbeat-system_1_41dc70b3d357 | 2019-06-30T09:56:00.622Z     ERROR   elasticsearch/client.go:335  Failed to perform any bulk index operations: Post https://xxx:443/_bulk: EOF
metricbeat-system_1_41dc70b3d357 | 2019-06-30T09:56:02.250Z     ERROR   pipeline/output.go:121       Failed to publish events: Post https://xxx:443/_bulk: EOF

Metricbeat config:

name: test-server
- module: system
  period: 1m
    - cpu
    - memory
    - filesystem
  cpu.metrics: [percentages, normalized_percentages]

  hosts: ["xxx:443"]
  protocol: https
  username: "xxx"
  password: "xxx"

Kibana status:

  "name" : "xxx",
  "cluster_name" : "xxx",
  "cluster_uuid" : "xxx",
  "version" : {
    "number" : "7.2.0",
    "build_flavor" : "default",
    "build_type" : "docker",
    "build_hash" : "508c38a",
    "build_date" : "2019-06-20T15:54:18.811730Z",
    "build_snapshot" : false,
    "lucene_version" : "8.0.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  "tagline" : "You Know, for Search"

Cluster status:

  "cluster_name" : "xxx",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 7,
  "number_of_data_nodes" : 6,
  "active_primary_shards" : 216,
  "active_shards" : 432,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.