My current pipeline is:
filebeat->Logstash->ES(3 nodes).
Filebeat and Logstash are deployed in the kubernetes cluster, both of the them are version-7.6.2.
ES is deployed as a container on a virtual machine, images version is [amazon/opendistro-for-elasticsearch:1.6.0]
I found that the logs on ES have a delay of about 7~8 minutes.
Logstash config:
 inputs:
    main: |-
      input {
        beats {
          port => 5044
        }
      }
  filters:
    # main: |-
    #   filter {
    #   }
  outputs:
    main: |-
      output {
        elasticsearch {
          hosts => ["${ELASTICSEARCH_HOST}:${ELASTICSEARCH_PORT}"]
          manage_template => false
          index => "XXX_%{+YYYY.MM.dd}"
          user => "XXXX"
          password => "XXX"
          ilm_enabled => false
        }
      }
more infomation about logstash:
$ cat logstash.yml 
config.reload.automatic: true
http.host: 0.0.0.0
http.port: 9600
path.config: /usr/share/logstash/pipeline
path.data: /usr/share/logstash/data
queue.checkpoint.writes: 1
queue.drain: true
queue.max_bytes: 1gb
queue.type: persisted
$ curl -XGET 'localhost:9600/_node/stats/os?pretty'
{
  "host" : "log-logstash-0",
  "version" : "7.6.2",
  "http_address" : "0.0.0.0:9600",
  "id" : "9a699ca0-63ff-4d4c-bfda-6c1672603009",
  "name" : "log-logstash-0",
  "ephemeral_id" : "cacf7e7a-0a5e-426b-93d1-b50d72037c07",
  "status" : "green",
  "snapshot" : false,
  "pipeline" : {
    "workers" : 2,
    "batch_size" : 125,
    "batch_delay" : 50
  },
  "os" : {
    "cgroup" : {
      "cpu" : {
        "cfs_quota_micros" : 200000,
        "stat" : {
          "time_throttled_nanos" : 10805873439,
          "number_of_times_throttled" : 198,
          "number_of_elapsed_periods" : 699951
        },
        "control_group" : "/",
        "cfs_period_micros" : 100000
      },
      "cpuacct" : {
        "usage_nanos" : 3358394995977,
        "control_group" : "/"
      }
    }
  }
}
According to prometheus indicators, logstash has a low load(cpu: 0.035, memory: 1GB).
I also made a pressure test on ES, the throughput is about 6000/s.
How can I figure out where the problem currently resides?


