Logstash tcp input CPU perfomance

Hello,

I have configured logstash using one tcp input as follows:

input {
  #logs01
  tcp {
    type => "logs"
    codec => "line"
    port => 9916
    add_field => { "event_dataset" => "logs_01" }
  }
}

With this configuration, we are expecting nearly 100 EPS from this source and everything is working as expected.

The thing is that we are expecting more logs from the same source on different port, so after adding another tcp input in the configuration file, logstash starts to increase the CPU utilization to more than 100% and we start to face a gap of time between the event indexed time and the event received timestamp in logstash.

input {
  #logs01
  tcp {
    type => "logs"
    codec => "line"
    port => 9916
    add_field => { "event_dataset" => "logs_01" }
  }
  #logs02
  tcp {
    type => "logs"
    codec => "line"
    port => 9917
    add_field => { "event_dataset" => "logs_02" }
  }
}

What could be the issue here? It seems something related to adding more than one tcp input in logstash. Could you help me with this issue?

Thank you.

You would need to share the full pipeline to give more insight into your issue.

What do your filters look like? What does your messages looks like? Do the messages from logs_01 and logs_02 have the same format or they are different?

I have the following configuration for logs_01 (the same config will be for logs_02, just changing the grok pattern and the input using port 9997)

input {
  #logs01
  tcp {
    type => "logs"
    codec => "line"
    port => 9916
    add_field => { "event_dataset" => "logs_01" }
  }
}
filter {
  #logs01 filtering
  grok {
	 patterns_dir => ["/etc/logstash/patterns"]
	 match => { "message" => "GROK PATTERN" }
  }
  if "_grokparsefailure" in [tags] {
	drop { }
  }
}
output {
  stdout { codec => rubydebug }
  http {
	  id => "logs_01"
	  headers => {
	   "x-api-key" => "APIKEY"
	  }
	  http_method => "post"
	  url => "API ENDPOINT"
	  proxy => "PROXY"
	  automatic_retries => 10
	  socket_timeout => 60
	  validate_after_inactivity => 3
	  request_timeout => 10
  }
}

I have one pipeline for each conf file, with the following parameters:

#logs_01 have approx. 100 EPS

  • pipeline.id: logs_01
    path.config: /etc/logstash/conf.d/logs_01.conf
    queue.type: memory
    pipeline.workers: 10
    pipeline.batch.size: 800
    pipeline.batch.delay: 1

#logs_02 have approx. 8 EPS

  • pipeline.id: logs_02
    path.config: /etc/logstash/conf.d/logs_02.conf
    queue.type: memory
    pipeline.workers: 2
    pipeline.batch.size: 1000
    pipeline.batch.delay: 1

Running in a logstash server with following requirements:

  • CPU: 2
  • RAM: 4GB

Hope you can help me with this issue.

Oh I see, the first configuration you shared is not the one you are using, if you are running pipelines.yml and have the two pipelines in different files, you do not have an input block with the two tcp inputs as you shared, so what I was thinking that could be the issue it is not true anymore, for example messages passing through grok filters that won't match.

With pipelines.yml your two pipelines are completely separated from each other.

But your pipeline configurations seems a little weird for the specs of your server, you have a server with 2 CPU and 4 GB, but you are using 10 workers for a pipeline with a batch size of 800, the recommendation for this setting is to change them from the default value if see that your server is not using all the resources, which is not your case now.

I would remove the pipeline.workers and pipeline.batch.size and also pipeline.batch.delay from your pipelines configurations and see how it performs, you have a low rate of events and the default values are more then enough for it, specially the pipeline.batch.delay which is in miliseconds and rarely needs to be changed.

Thank you for your prompt reply.

I have just tried to remove pipeline.workers,pipeline.batch.size and pipeline.batch.delay as suggested, but the CPU utilization is still over 100%. Furthermore, with default parameters for workers we noticed that indexing rate decrease a lot for logs_01:

image

If I delete the pipeline for logs_02, the CPU utilization goes under 30%, but indexing rate doesn't go back to normal as shown in the screenshot.

If you add the logs_02 pipeline and leave both pipelines running with the default configurations, the CPU will also increase to 100% as before?

Since without the logs_02 pipeline your CPU utilization is pretty low, you could then start changing the parameters for the logs_01 pipeline, I would start doubling the workers and batch size from its default value until you find the optimal value.

But you need first to find what is the issue with the logs_02 pipeline, can you share some example messages for this pipeline and also the grok patterns that you are using?

That's right. I don't have CPU issues when running just log_01 pipeline and after changing workers, indexing rate comes back to normal and CPU is still okay.

But after adding logs_02 pipeline with default workers and parameters, CPU increase over 100% and indexing rate in other pipeline decrease as mentioned.

This is the log format for logs_02: (fields between double quotes, separated by comma)

"Tue Jun 21 14:03:23 2022","user@mail.com","Data","Location","80","50","443","45","X.X.X.X","Y.Y.Y.Y","Z.Z.Z.Z","A.A.A.A","B.B.B.B","0","Data","Drop","No","Yes","No","HTTPS","app","TCP","SSL","Country","115","POLICY","197","683","0","115","1","None","None","None","user","hostuser"

This is the grok pattern (similar to the one used for log_01 as they are events from the same techonology but with some differences in the log format):

match => { "message" => "\"%{TIMESTAMP_01:eventcreated}\",\"%{DATA:useremail}\",\"%{DATA:organizationdep}\",\"%{DATA:organizationloc}\",\"%{NUMBER:cdestinationport}\",\"%{NUMBER:courceport}\",\"%{NUMBER:sdestinationport}\",\"%{NUMBER:ssourceport}\",\"%{IP:csourceip}\",\"%{IP:cdestinationip}\",\"%{IP:ssourceip}\",\"%{IP:sdestinationip}\",\"%{IP:sourceip}\",\"%{NUMBER:source_port}\",\"%{DATA:eventtype}\",\"%{DATA:eventaction}\",\"%{GREEDYDATA}\",\"%{GREEDYDATA}\",\"%{GREEDYDATA}\",\"%{DATA:networktype}\",\"%{DATA:networkapplication}\",\"%{DATA:networkprotocol}\",\"%{DATA:eventcategory}\",\"%{DATA:countryname}\",\"%{NUMBER:eventduration}\",\"%{DATA:rulename}\",\"%{NUMBER:clientbytes}\",\"%{NUMBER:destinationbytes}\",\"%{GREEDYDATA}\",\"%{GREEDYDATA}\",\"%{GREEDYDATA}\",\"%{DATA:ruleset}\",\"%{DATA:indicatordescription}\",\"%{DATA:indicatorname}\",\"%{DATA:username}\",\"%{DATA:hostname}\"" }

using pattern: TIMESTAMP_01 %{DAY} %{MONTH} %{MONTHDAY} %{HOUR}:%{MINUTE}:%{SECOND} %{YEAR}

Your grok pattern is very expensive with a lot of GREEDYDATA and DATA patterns, it is also not anchored with a ^, which helps improve the grok performance.

Normally you should anchor your groks and avoid using many GREEDYDATA , which is expensive.

But in your case you do not even need to use grok if your messages from log_02 are like this, you have a CSV message, so it is better to use the csv filter to parse it.

The main difference is that it won't validate the value of the field like grok, but if your messages have all the same format and the values do not change, for example, the column where you have an IP address will always have an IP address, then there is no need to use grok.

The following filter will parse your message, it will but to unnamed fields where you are using GREEDYDATA in a field named not_used and it will remove this field if the csv filter is successful.

filter {
    csv {
        source => "message"
        separator => ","
        skip_empty_columns => true
        columns => [
            "eventcreated", "useremail", "organizationdep", "organizationloc", "cdestinationport", "courceport", "sdestinationport", "ssourceport", "csourceip", "cdestinationip", "ssourceip", "sdestinationip", "sourceip", "source_port", "eventtype", "eventaction", "not_used", "not_used", "not_used", 
            "networktype", "networkapplication", "networkprotocol", "eventcategory", "countryname", "eventduration", "rulename", "clientbytes", 
            "destinationbytes", "not_used", "not_used", "not_used", "ruleset", "indicatordescription", "indicatorname", "username", "hostname"
        ]
        remove_field => ["not_used"]
    }
}

Besides that, if you still want to use grok, you should anchor your pattern using a ^ and also try to use other thing in the place of GREEDYDATA or DATA, maybe a NOTSPACE helps, and I think it is less expensive than GREEDYDATA.

1 Like

Definitely, that was the issue. After using csv filter as suggested for logs_02 my CPU went back to normal and indexing rate is working fine as well.

Thank you so much for the help!!

I'd like to adapt as well logs_01 to use csv filter, but some fields have double quotes in the values, for example:

"Tue Jun 21 14:03:23 2022","user@mail.com","Data","Location","80","50","443","45","domainexample&events=[["pageview"%2c{}]]","Y.Y.Y.Y","Z.Z.Z.Z","A.A.A.A","B.B.B.B","0","Data","Drop","No","Yes","No","HTTPS","app","TCP","SSL","Country","115","POLICY","197","683","0","115","1","None","None","None","user","hostuser"

So, I'm facing this error:
:exception=>#<CSV::MalformedCSVError: Missing or stray quote in line 1

Is there a way to use csv filters with this type of fields value?

1 Like

It is always the same field, in the same position in the csv or other fields in other positions could also have double quotes?

If it always the same position in the csv message, then maybe you could do a workaround using dissect to split your message and multiple parts.

1 Like