Data delay in ELK

sundar1 · January 23, 2017, 11:24am

Hi,
Everyday we are getting close to 2 TB data and per sec 70K events from Kafka to ELK.
The below are my ELK and hardware setup.
15 logstash servers(36 GM RAM, 6 CPU)
25 data nodes(36 GM RAM, 6 CPU) and each data node has 2 TB storage.

We are getting data delay almost 2 days. Could you please suggest to me is any issues with my configurations and hardware.

Regards
Sundar

Christian_Dahlqvist · January 23, 2017, 11:36am

Have you verified whether Elasticsearch or Logstash is limiting throughput? If so, how did you go about doing so?

sundar1 · January 23, 2017, 1:19pm

Thanks Christian! I have tested 6k throughput from Kafka to logstash server but not tested throughput from logstash to elasticsearch .Is any way to test?

Christian_Dahlqvist · January 23, 2017, 1:38pm

If each Logstash instance is able to process in excess of 6k events per second (90k per second in total) with all the filters present but without outputting to Elasticsearch, it sounds like either Elasticsearch or the Elasticsearch output plugin could be the bottleneck. Which version of Logstash are you using? How have you configured your Elasticsearch output plugin(s)?

sundar1 · January 23, 2017, 2:31pm

I'm using Logstash 2.4 and elasticsearch is 5.0.2 version.

output {
elasticsearch {
codec => avro {
schema_uri => "/apps/schema/rocana3.schema"
}
hosts => "http://es-uat-rtp-master-ltm.xxx.com:9200/"
index => "logstash-applogs-%{+YYYY.MM.dd}-1"
workers => 6
}
}

Christian_Dahlqvist · January 23, 2017, 3:20pm

I would expect each Logstash node to be able to connect to all data nodes in order to spread the load, but it looks like each Logstash node is sending all traffic to a single node (which also based on the name seems to be a master node). Is traffic evenly spread across the cluster?

sundar1 · January 24, 2017, 10:58am

Thanks for the reply and removed ltm url from host and given all data nodes in the host.
host => ["host1","host2",............"host25"]
even not getting more data to the elasticsearch data node.

Christian_Dahlqvist · January 24, 2017, 11:08am

What does the resource utilisation look like on the Elasticsearch nodes? Do you see high CPU usage and/or high iowait? Is there anything in the Elasticsearch logs indicating e.g. long GC or merge throttling?

sundar1 · January 24, 2017, 11:48am

Seems CPU and IO is fine. No issues for GC.

Please find below are my ELK config hope it will help you to understand my configuration and suggest if anything is wrong

Linux Infrastructure for logstash, ES and Kibana
Hardware 6 CPU / 32 GB RAM
Operating System Oracle Enterprise Linux 6 FID16a 2X-Large

Master Node ES :-

cluster.name: sei-elk-uat-rtp
node.name: sundar-master-01
node.master: true
node.data: false
path.data: /apps/masterES/data
path.logs: /apps/masterES/logs
bootstrap.memory_lock: true
network.host: 01.02.03.04
http.port: 9200
discovery.zen.ping.unicast.hosts: ["master1 ip","master2 ip","master3 ip"]
discovery.zen.minimum_master_nodes: 2
http.cors.enabled: true
http.cors.allow-origin: "*"

Data Node ES :-

cluster.name: sei-elk-uat-rtp
node.name: sundar-data-01
node.master: false
node.data: true
path.data: /apps/dataES1/data
path.logs: /apps/dataES1/logs
discovery.zen.ping.unicast.hosts: ["master1 ip","master2 ip","master3 ip"]
network.host: 05.06.07.08
http.port: 9200
bootstrap.memory_lock: true

Client ES:-

cluster.name: sei-elk-uat-rtp
node.name: sundar-client-01
node.master: false
node.data: false
path.data: /apps/clientES/data
path.logs: /apps/clientES/logs
network.host: 10.138.000.00
http.port: 9200
discovery.zen.ping.unicast.hosts: ["master1 ip","master2 ip","master3 ip"]

sundar1 · January 25, 2017, 3:02pm

Today we have created close to 50 data nodes(2TB storage),3 masters and 3 clients.
still we are not getting more index rate. the screen has monitoring details.
Could you please me if any wrong settings.

Christian_Dahlqvist · January 25, 2017, 3:23pm

If you have doubled the size of the cluster, traffic is distributed across all nodes and the shards being indexed into are spread across node and you are still not seeing any performance improvement, it is quite possible that Logstash after all is limiting throughput.

The Kafka input has a range of configuration parameters that you can tune for performance, e.g. consumer_threads. It may be worthwhile tuning this, but I have personally not done it so can not really give any advice on this.

system · February 22, 2017, 3:23pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Huge delay in logs to ES from Kafka/Logstash Elasticsearch	5	2137	August 4, 2020
Huge Time Delay between logstash and elasticsearch Logstash	5	3574	July 6, 2017
Boost throughput of Kafka input Logstash	6	1595	February 24, 2021
Data delay writing to ES Logstash	6	1159	March 3, 2018
Tuning Logstash for optimal throughput for ELK pipeline Logstash	4	368	March 27, 2020

Data delay in ELK

Related topics