Hi guys,
I have some issues with the performance of the ingestion of data into Elasticsearch through Logstash.
My ingestion setup is the following:
- Data ingested are log lines read from file through Filebeat
- Log lines read are sent to Logstash for parsing in order to enrich them (in particular a kv filter plugin is used to extract key-value pairs from the log message body)
- Logstash sends the enriched events to Elasticsearch
My project requires a near real-time availability on Elasticsearch of the enriched log messages.
By near real-time I mean delays not greater than 10 seconds. Instead, I usually get delays of 30 seconds to a few minutes, and sometimes even delays of the order of 10 minutes, which are totally not acceptable.
Architecture is the following:
- multiple Filebeat istances which are installed on the machines where logs are written
- 1 dedicated host where I put the Elasticsearch node (my cluster is composed of just 1 data node) and 1 instance of Logstash
I am not allowed to install more Elasticsearch nodes nor more Logstash nodes. For the moment I am stuck with 1 ES node and 1 Logstash node.
I have set the following properties for the pipeline:
pipeline.workers: 4
pipeline.batch.size: 100
pipeline.batch.delay: 1
queue.type: memory
In order to monitor the performance of the entire ingestion process, I activated X-Pack Monitoring for Elasticsearch, Logstash and Filebeat, thus I'm able to see metrics like:
- Indexing Rate and Indexing Latency for Elasticsearch
- Event Emitted Rate and Event Latency for Logstash
- Event Emitted Rate for Filebeat.
I tried to get some information about the typical elaboration time inside Logstash filter by adding "start" and "end" times at the beginning and end of the filter section, like in the following:
input {
beats {
port => 5044
}
}
filter {
ruby {
init => "require 'time'"
code => 'event.set("logstash_start_timestamp", Time.now())'
}
### START USUAL ELABORATION
kv {
...
}
### END USUAL ELABORATION
ruby {
init => "require 'time'"
code => 'event.set("logstash_end_timestamp", Time.now())
tend = event.get("logstash_end_timestamp").to_f
tstart = event.get("logstash_start_timestamp").to_f
diff = tend - tstart
event.set("logstash_elaboration_time", diff)'
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "my_index"
pipeline => "add_ingest_timestamp"
}
}
I also wanted to measure the time between the end of the Logstash filter elaboration and the moment when the event is actually sent to Elasticsearch to be indexed.
So I used an ingest pipeline that computes the difference between the value of logstash_end_time
and the value of the meta-field _ingest.timestamp
. This difference is then added as a numeric field to the event.
Over a time window of ~15 minutes where the input stream from Filebeat was fairly high, I got the following average stats from the monitoring tool:
-
Logstash:
- CPU Usage: 26%
- 5 min average load: 2
- Heap used: 17%
- Events Ingested: 800
- peak Event Emitted Rate: 25 events/s
- peak Event Latency: 3200 ms
-
Elasticsearch:
- CPU Usage: 10%
- Load average: 2.2
- Heap used: 25%
- minimum Indexing Rate: 30 events/s
- peak Indexing Latency: 0.25 ms
I did not include Filebeat monitoring metrics since I think it is the fastest component of the three.
For the Logstash elaboration time and ingest delay (obtained as explained above), over the same time window I get approximately the following averages per event:
- logstash elaboration time: 30000 ms
- ingest delay: 1250 ms
So it seems that most of the delay is due to what happens inside the Logstash pipeline filter.
What could be the cause of this slow-down?
I tried to tune the properties pipeline.workers
, pipeline.batch.size
and pipeline.batch.delay
, but nothing substantially changed.
I also increased Elasticsearch and Logstash committed heap memory, but again I did not get any improvement.
Does anyone have suggestions on how to speed-up the ingestion?