Logstash config settings


#1

I have logstash 6.0 installed on a 4 CPU node with 32GB memory. I would like to improve the performance of my ingest to elasticsearch. At present i am ingesting a 8 GB file that contains 42 columns of text data. With the default logstash settings i was able to ingest 6 million records in one hour. How do i improve this ingest speed with changes to my configuration settings. Also noticed the ingest process to slowdown after 4 hours. It may be ingesting 3 million records per hour now. How do i improve ingest efficiencies and also account for this slowdown in ingest efficiency that is happening overtime. Thanks


(Christian Dahlqvist) #2

How have you identified that it is Logstash and not Elasticsearch that is the bottleneck? Are you letting Elasticsearch set document IDs or are you supplying them yourself when indexing?


#3

I am letting elasticsearch set document ID's. Based on the logstash node hardware specs, I feel that it is not being fully used to its potential. For this run the memory utilization was showing up as 2GB where as we have 32GB available. How do i decide to up the memory setting and what environment settings will have to be aligned for updates.


(Christian Dahlqvist) #4

Logstash performance is typically limited either by CPU of network performance. It is however only able to process data as fast as the systems receiving it can accept it.

Before starting to look at tuning Logstash, have you verified that your downstream system are able to handle higher throughput and are not the bottleneck here?


#5

No. How do i check on the downstream system.


(Christian Dahlqvist) #6

Where are you sending your data?


#7

I am sending the data to a 8 node elasticsearch cluster where each of the nodes has a 2CPU,32gbmemory specification. This was the only job running for ingestion to this cluster.


(Christian Dahlqvist) #8

What type of hardware and storage are the Elasticsearch nodes using? Look at CPU usage and disk I/O on your Elasticsearch nodes and verify that none of the Elasticsearch nodes are saturated or limiting performance.

Documents per second is a poor metric for judging performance as size and complexity of documents can affect the amount of work that need to be done quite a lot. What is the average size of your documents?

Another factor that can impact indexing performance is how many indices and shards you are actively indexing into. Can you provide some information about this?


(system) #9

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.