Time delay between logstash and elasticsearch

honglei · April 13, 2020, 9:31am

My current pipeline is:
filebeat->Logstash->ES(3 nodes).

Filebeat and Logstash are deployed in the kubernetes cluster, both of the them are version-7.6.2.
ES is deployed as a container on a virtual machine, images version is [amazon/opendistro-for-elasticsearch:1.6.0]

I found that the logs on ES have a delay of about 7~8 minutes.

Logstash config:

 inputs:
    main: |-
      input {
        beats {
          port => 5044
        }
      }

  filters:
    # main: |-
    #   filter {
    #   }

  outputs:
    main: |-
      output {
        elasticsearch {
          hosts => ["${ELASTICSEARCH_HOST}:${ELASTICSEARCH_PORT}"]
          manage_template => false
          index => "XXX_%{+YYYY.MM.dd}"
          user => "XXXX"
          password => "XXX"
          ilm_enabled => false
        }
      }

more infomation about logstash:

$ cat logstash.yml 
config.reload.automatic: true
http.host: 0.0.0.0
http.port: 9600
path.config: /usr/share/logstash/pipeline
path.data: /usr/share/logstash/data
queue.checkpoint.writes: 1
queue.drain: true
queue.max_bytes: 1gb
queue.type: persisted

$ curl -XGET 'localhost:9600/_node/stats/os?pretty'
{
  "host" : "log-logstash-0",
  "version" : "7.6.2",
  "http_address" : "0.0.0.0:9600",
  "id" : "9a699ca0-63ff-4d4c-bfda-6c1672603009",
  "name" : "log-logstash-0",
  "ephemeral_id" : "cacf7e7a-0a5e-426b-93d1-b50d72037c07",
  "status" : "green",
  "snapshot" : false,
  "pipeline" : {
    "workers" : 2,
    "batch_size" : 125,
    "batch_delay" : 50
  },
  "os" : {
    "cgroup" : {
      "cpu" : {
        "cfs_quota_micros" : 200000,
        "stat" : {
          "time_throttled_nanos" : 10805873439,
          "number_of_times_throttled" : 198,
          "number_of_elapsed_periods" : 699951
        },
        "control_group" : "/",
        "cfs_period_micros" : 100000
      },
      "cpuacct" : {
        "usage_nanos" : 3358394995977,
        "control_group" : "/"
      }
    }
  }
}

According to prometheus indicators, logstash has a low load(cpu: 0.035, memory: 1GB).
I also made a pressure test on ES, the throughput is about 6000/s.

How can I figure out where the problem currently resides?

rugenl · April 13, 2020, 2:32pm

Are you setting @timestamp to a value from the log or the ingest time? If ingest time, compare @timestamp to the value in the log.

What is the index refresh_interval?

If you have Kibana monitoring enabled, what is the indexing latency? Monitoring for logstash, if enabled, also has event latency.

Are you using ntp everywhere?

Christian_Dahlqvist · April 13, 2020, 3:44pm

It seems you have set this to ensure as high durability as possible. Note that this also will result in increased disk I/O and potentially lower throughput as a result. Does the storage you are using for the Logstash node have capacity to handle this or might it be a bottleneck?

honglei · April 14, 2020, 2:19am

@rugenl Thanks for your reply

I'm setting @timestamp to a value from the ingest time. The ingest time is about 3 seconds slower than the log time

The index refresh_interval is not specified, Use the default value(1s).

GET <index>/_settings?pretty
{
  "XXX" : {
    "settings" : {
      "index" : {
        "creation_date" : "1586822809255",
        "number_of_shards" : "1",
        "number_of_replicas" : "1",
        "uuid" : "t3dhllx5SfG2XzKakS-RSw",
        "version" : {
          "created" : "7060199"
        },
        "provided_name" : "XXX"
      }
    }
  }
}

We did not enable monitoring.

Yes.

honglei · April 14, 2020, 3:09am

@Christian_Dahlqvist Thanks for your reply.
The value of iowait is 6.5% ~ 7.5 on Logstash node. Disk write speed is 80KB/s and read speed is 20KB/S, so I think the disk is not the bottleneck.

BTW: A few days after Logstash restarts, the problem will reproduce. The problem will not reproduce immediately.
I have 4 kubernetes clusters, I restarted logstash on one of the clusters two days ago, but the problem has not been reproduced.

Christian_Dahlqvist · April 14, 2020, 5:55am

That much iowait could be noticeable. Are you monitoring iowait so you can see whether the amount of iowait varies over time? Does this behaviour change if you increase the write checkpoint size?

honglei · April 14, 2020, 8:26am

@Christian_Dahlqvist Thanks for your advise.I changed the value of queue.checkpoint.writes from 1 to 1024. The value of iowait is significantly lower than before.
But the logs on ES still have a delay of about 7~8 minutes.

Christian_Dahlqvist · April 14, 2020, 8:31am

Why is it reporting being throttled? How much resurces does it have available? Is the time synchronized across the hosts/pods?

honglei · April 14, 2020, 8:48am

Host resource: 4Core / 16G MEM
Pod resource: 2Core / 2G MEM

Time is synchronized across the hosts/pods.

$ date; sudo docker exec 0723956d651b date
Tue Apr 14 16:43:04 CST 2020
Tue Apr 14 08:43:04 UTC 2020

Christian_Dahlqvist · April 14, 2020, 8:52am

Are all pods running on this host, including the application generating logs and Elasticsearch?

If so, what does CPU usage on the underlying host look like?

honglei · April 14, 2020, 9:04am

No, filebeat, logstash and application pods are on the same host.
The Elasticsearch cluster is outside the k8s cluster.

The architecture is as follows:
elk

honglei · April 14, 2020, 3:20pm

I have tuned logstash performance with follow configuration:

$ cat logstash.yml 
......
pipeline.batch.delay: 100
pipeline.batch.size: 250
pipeline.workers: 4
queue.checkpoint.writes: 4096
......

Now it looks like there is no delay, the amount of iowait has decreased.
Thanks for your help.
@Christian_Dahlqvist
@rugenl

system · May 12, 2020, 3:20pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Delay between Logstash and ES? ELK 7.16.1 [ECK] Logstash	16	2429	March 14, 2022
Lag is there from filebeat to Elasticsearch Beats filebeat	3	439	November 17, 2021
Huge Time Delay between logstash and elasticsearch Logstash	5	3583	July 6, 2017
Lag in Filebeat to Kibana Beats filebeat	6	848	July 17, 2021
Pipeline Delay 7.9 Elasticsearch	11	1323	October 19, 2020

Time delay between logstash and elasticsearch

Related topics