What does your Logstash configuration look like? What does your data look like? How much indexing throughput is your Elasticsearch cluster able to handle?
You are lamenting Logstash's performance without showing us how you have it configured. 190,000 messages per minute is only 3,167 messages per second. Unless they're extremely complex, and/or have heavy-duty enrichment going on, this should be achievable with that single server. There may be some constraints on I/O, but without knowing how you've configured anything, there's nothing we can do to assist you.
If this is your complete configuration, and there are no other configuration files or inputs anywhere else in your pipeline, then you don't need either of the if [type] == "edr" { lines. Those conditionals would be checking every line, but every line would already be of type edr because of the type => "edr" line in your file input block.
Those conditionals will each add a small amount of latency for your messages.
This regular expression is reading each full line to find "\bCTE\b", which much more expensive in terms of processing time than to look for the value CTE in an individual field. You're already breaking down the csv into individual fields in the csv filter. Why check the entire message if the value will only be in a given field? This could be slowing things down dramatically.
If everything is truly CSV, then you could replace this with the dissect filter and get a non-trivial performance and throughput boost.
These are a few quick observations, in no particular order of relative or expected performance gain.
Last thing, i must put "max_open_files => 30000 " in input because without, i have somme error message about max open file reached (even so ulimit = infiny ...)
You may want to ingest fewer files at once. I understand that you have many files you want to read in, but this message indicates you may be taxing Logstash by trying to open too many files at once. Try limiting the scope of your glob/wildcard and see if that helps.
Events per second
Dissect 16396
Mutate (rename) 29248
EDIT : With DISSECT i have some error message type :
Dissector mapping, key not found in event
Dissector mapping, key not found in event
Dissector mapping, key not found in event
Dissector mapping, key not found in event
Dissector mapping, key not found in event
I know what, i have one conf file by type of data (3 currently):
logstash-ed.conf
logstash-ca.conf
logstash-ms.conf
By removing the "if type == xx", Lostash tries to apply Dissect filter to each conf I believe. I have not yet changed the others
I did warn about that in the beginning. You have such a strong server you might want to consider a separate pipeline for each configuration file. In 5.x, that means a separate instance of Logstash for each (not necessarily multiple installs, just one with 3 different configurations). In 6.0, you'll be able to define multiple pipelines within one instance.
Or you could go back to the conditionals the way you had it before.
@Beuhlet_Reseau
One thing to note about using Dissect instead of CSV is:
Dissect does not check for a , comma in a quoted section like the CSV filter does.
e.g. a message line like this
Adam Andrews, Beth Bell, "Cliff, Clive", Dave Dent
name_1: Adam Andrews
name_2: Beth Bell
name_3: "Cliff
name_4: Clive"
others: Dave Dent
PROTIP 1: Always include a others or rest field at the end. Then check that this field is always empty - if its not, then your data has changed in some way. Output to a file or send an email or put it in redis.
PROTIP 2: use a named skip field if you know you don't need that data. e.g. %{?host}
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.