Logstah performance issues

Beuhlet_Reseau · September 1, 2017, 12:52pm

Hello,

I am so tired about Logstash problem.

I have total 2 850 000 messages every 15 minutes (3 document type differents), and On one of type, I have 2 hours of delays.

2 850 000 message every 15 minutes = 190 000 messages minutes

=> I have dedicated logstash serveur (24 cpu 24 Go RAM), 3 elasticsearch on cluster (each have 12 cpu 16 Go RAM) and dedicated Kibana.

Logstash configuration =

pipeline.workers: 24
pipeline.output.workers: 24
pipeline.batch.size: 250
pipeline.batch.delay: 5

I will think to search a better etl it's not normal this perf (also 3 shards and 1 replica by type)

warkolm · September 2, 2017, 9:32pm

You will probably get better performance by running multiple instances of Logstash.

This also depends on what your config is, what version of the stack you are on, your OS and even your JVM.

Christian_Dahlqvist · September 3, 2017, 6:39am

What does your Logstash configuration look like? What does your data look like? How much indexing throughput is your Elasticsearch cluster able to handle?

theuntergeek · September 6, 2017, 10:39pm

You are lamenting Logstash's performance without showing us how you have it configured. 190,000 messages per minute is only 3,167 messages per second. Unless they're extremely complex, and/or have heavy-duty enrichment going on, this should be achievable with that single server. There may be some constraints on I/O, but without knowing how you've configured anything, there's nothing we can do to assist you.

Beuhlet_Reseau · September 7, 2017, 8:32am

My logstash conf look like :

input {
  file {
    path => "/data/logstash/data/edr/P_ASK_*.txt"
    type => "edr"
    max_open_files => 30000
  }
}
filter {
  if [type] == "edr" {
    csv {
      columns => [ "Date","CTE","CT","SubsId","MastN","MDN","MCC","ID","ModFlag","ParentalControlFlag","RuleBase","FairUseFlag","Flows","Label","Notification","TotalOctets","Grantectets","AP","QoS-Rule","Usamit","Januomer" ]
    }
    if [message] =~ "\bCTE\b" {
      drop { }
    }
    mutate {
      remove_field => [ "message", "host", "path" ]
    }
    date {
      match => [ "Date" , "UNIX" ]
      remove_field => ["Date"]
    }
  }
}
output {
  if [type] == "edr" {
    elasticsearch {
      hosts => ["opm1zels01.com:9200","opm1zels02.com:9200","opm1zels03.com:9200"]
      index => "ed-%{+YYYY.MM.dd}"
    }
  }
}

theuntergeek · September 7, 2017, 3:43pm

If this is your complete configuration, and there are no other configuration files or inputs anywhere else in your pipeline, then you don't need either of the if [type] == "edr" { lines. Those conditionals would be checking every line, but every line would already be of type edr because of the type => "edr" line in your file input block.

Those conditionals will each add a small amount of latency for your messages.

This regular expression is reading each full line to find "\bCTE\b", which much more expensive in terms of processing time than to look for the value CTE in an individual field. You're already breaking down the csv into individual fields in the csv filter. Why check the entire message if the value will only be in a given field? This could be slowing things down dramatically.

Beuhlet_Reseau:

csv {
      columns => [ "Date","CTE","CT","SubsId","MastN","MDN","MCC","ID","ModFlag","ParentalControlFlag","RuleBase","FairUseFlag","Flows","Label","Notification","TotalOctets","Grantectets","AP","QoS-Rule","Usamit","Januomer" ]
    }

If everything is truly CSV, then you could replace this with the dissect filter and get a non-trivial performance and throughput boost.

These are a few quick observations, in no particular order of relative or expected performance gain.

Beuhlet_Reseau · September 8, 2017, 9:38am

Top thank you.

in fact, I have a header on each file. I do not see how to do what you're talking about
So, maybe like that :

if [CTE] = "CTENAME" { drop {} }
csv {
      columns => [ "Date","CTE","CT","SubsId","MastN","MDN","MCC","ID","ModFlag","ParentalControlFlag","RuleBase","FairUseFlag","Flows","Label","Notification","TotalOctets","Grantectets","AP","QoS-Rule","Usamit","Januomer" ]
    }

?

I don't know how to make that and if it's installed (ELK 5.5)

theuntergeek · September 8, 2017, 1:15pm

Because it's a conditional, it should be:

if [CTE] == "CTENAME" { drop {} }

5.5 should have the dissect filter installed by default. The linked blog post in my earlier answer shows how to start configuring the dissect filter.

Beuhlet_Reseau · September 8, 2017, 1:28pm

Ok thank you @theuntergeek
My config look like :

input {
  file {
    path => "/data/logstash/data/edr/P_ASK_*.txt"
    max_open_files => 30000
  }
}
filter {
    csv {
      columns => [ "Date","CTE","CT","SubsId","MastN","MDN","MCC","ID","ModFlag","Parental","RuleBase","Faig","Flows","Label","Notification","TotalOctets","Grantectets","AP","QoS-Rule","Usamit","Januomer" ]
    }

    if [CTE] == "CTENAME" { drop {} }

    mutate {
      remove_field => [ "message", "host", "path" ]
    }
    date {
      match => [ "Date" , "UNIX" ]
      remove_field => ["Date"]
    }
}
output {
    elasticsearch {
      hosts => ["opm1zels01.com:9200","opm1zels02.com:9200","opm1zels03.com:9200"]
      index => "ed-%{+YYYY.MM.dd}"
    }
}

Last thing, i must put "max_open_files => 30000 " in input because without, i have somme error message about max open file reached (even so ulimit = infiny ...)

Beuhlet_Reseau · September 8, 2017, 2:10pm

So, that good like that @theuntergeek :

dissect {
      mapping => {
        "message" => "%{Date},%{CTE},%{CT},%{SubsId},%{MastDN},%{MDN},%{MCC},%{ID},%{ModFlag},%{ParentalControl},%{RBe},%{FFlag},%{Status},%{RedLabel},%{NotificatD},%{UsedT},%{TotalOctets},%{AP},%{List},%{Usmit},%{Janomer}"
      }
    }

What is :

%{priority} or %{?priority}

or the difference between %{CTE} and %{+CTE} (i think it's to add multiple field into one no ?)

Is a problem if i define date field (for example : CTE1,dd/MM/yyyy HH:mm:ss,CT12 ) like that : %{CTE},%{Date},%{CT} ?

theuntergeek · September 8, 2017, 3:00pm

With regards to the dissect filter, since your delimiter is always a comma, you shouldn't have to worry about using the ? or + in your field names.

theuntergeek · September 8, 2017, 3:02pm

You may want to ingest fewer files at once. I understand that you have many files you want to read in, but this message indicates you may be taxing Logstash by trying to open too many files at once. Try limiting the scope of your glob/wildcard and see if that helps.

Beuhlet_Reseau · September 8, 2017, 3:14pm

I see, Dissect filter incorporates a field suppression section, it has better than a traditional mutate

So i can remove

mutate {
      remove_field => [ "message", "host", "path" ]
    }

TO

dissect {
      mapping => {
        "message" => "%{xxxxx}...."
      }
    remove_field => [ "message", "host", "path" ]
}

Right ? 44% of efficacity it's amazing

Events per second
Dissect	        16396	
Mutate (rename)	29248

EDIT : With DISSECT i have some error message type :

 Dissector mapping, key not found in event
 Dissector mapping, key not found in event
 Dissector mapping, key not found in event
 Dissector mapping, key not found in event
 Dissector mapping, key not found in event

I know what, i have one conf file by type of data (3 currently):

logstash-ed.conf
logstash-ca.conf
logstash-ms.conf

By removing the "if type == xx", Lostash tries to apply Dissect filter to each conf I believe. I have not yet changed the others

theuntergeek · September 8, 2017, 3:41pm

I did warn about that in the beginning. You have such a strong server you might want to consider a separate pipeline for each configuration file. In 5.x, that means a separate instance of Logstash for each (not necessarily multiple installs, just one with 3 different configurations). In 6.0, you'll be able to define multiple pipelines within one instance.

Or you could go back to the conditionals the way you had it before.

Beuhlet_Reseau · September 8, 2017, 3:44pm

Ok @theuntergeek ! but it's bad if i have 3 conf files instead of a single containing the 3 with conditionnal if?
What do you think about that ?

EDIT : we agree that there is no need to define an conditionnal if type in the output section, I send everything in elasticsearch In any case

theuntergeek · September 8, 2017, 4:03pm

If they are not bounded by conditionals, then you will export each line to elasticsearch 3 times.

Beuhlet_Reseau · September 8, 2017, 4:04pm

Ok @theuntergeek Thank you vey much for your help ! You are a good man and you are my sensei now.

theuntergeek · September 8, 2017, 4:05pm

Logstash merges the files into one when it reads them in. You still would need conditionals or separate instances of Logstash.

guyboertje · September 8, 2017, 5:08pm

@Beuhlet_Reseau
One thing to note about using Dissect instead of CSV is:
Dissect does not check for a , comma in a quoted section like the CSV filter does.
e.g. a message line like this

Adam Andrews, Beth Bell, "Cliff, Clive", Dave Dent

and a dissect like this:

%{name_1}, %{name_2}, %{name_3}, %{name_4}, %{others}

will give (not what is expected)

name_1: Adam Andrews
name_2: Beth Bell
name_3: "Cliff
name_4: Clive"
others: Dave Dent

PROTIP 1: Always include a others or rest field at the end. Then check that this field is always empty - if its not, then your data has changed in some way. Output to a file or send an email or put it in redis.

PROTIP 2: use a named skip field if you know you don't need that data. e.g. %{?host}

Beuhlet_Reseau · September 11, 2017, 9:22am

Hello,

Excuse me but, i don't understand ., What's the aim ? Sometimes my fields are empty it's a problem ?

Ok, it's better to use %{?onething} or use

dissect {
      mapping => {
        "message" => "...|%{onething}" ]
      }
     remove_field => [ "%{onething}" ]
}

Topic		Replies	Views
Logstash Peformance Logstash	35	3279	May 19, 2017
FIlebeat-Redis-Logstash : Filebeat fast and Logstah slow, logstash threading? Logstash	19	3817	February 10, 2017
Logstash taking too long to process data Logstash	22	10052	March 2, 2017
Logstash filtering drives me nuts Logstash	8	476	June 16, 2021
Logstash improve time performance Logstash	13	695	April 12, 2018

Logstah performance issues

Related topics