Logstash 8.6 low performance

RobertC1 · October 9, 2023, 1:33pm

Hi there
I ha a server with Linux Ubuntu 20.04 and ELK 8.6
I noticed that the ingestion proccess became slow and I have not change any parameters.
This is the conf file for theindex.

input {
        file {
                path => "/opt/trabaja/csv/trx_hours*.csv"
                start_position => "beginning"
                sincedb_path=> "NULL"
                mode => "read"
                file_completed_action => "delete"
                file_sort_by => "path"
                exit_after_read => true
             }
        }
filter {
               csv { separator  => ";"
              columns => ["nodo","fecha","usuario","serviceid","sesid","base","sp","trn","ssn","tiempo","tiempo_sp","observacion"]}

mutate  { remove_field => [ "message", "@version","host","[log][file][path]" ] }
mutate { convert => [ "fecha", "string" ]  }
mutate { convert => [ "usuario", "string" ]  }
mutate { convert => [ "serviceid", "string" ]  }
mutate { convert => [ "sesid", "string" ]  }
mutate { convert => [ "base", "string" ]  }
mutate { convert => [ "sp", "string" ]  }
mutate { convert => [ "trn", "float" ]  }
mutate { convert => [ "ssn", "float" ]  }
mutate { convert => [ "tiempo", "float" ]  }
mutate { convert => [ "tiempo_sp", "float" ]  }
mutate { convert => [ "observacion", "string" ]  }
mutate { add_field => { "fecha_dia" => "%{fecha}" } }
date {
match => [ "fecha_dia", "yyyy-MM-dd HH:mm:ss.SSS" ,"ISO8601"]
timezone => "America/Argentina/Buenos_Aires"
target => "@timestamp"
}
mutate  { remove_field => [ "fecha_dia" ]}
}

output{
    elasticsearch {  hosts => ["localhost:9200"]
                     index => "trx_hours_new"
                     user => "elastic"
                     password => "Accusys123*"
                     retry_on_conflict => 0 }
                     stdout { }
    }

logstash jvm.options 8gb
Any help please.
Thanks in advance.

Rios · October 11, 2023, 11:21am

Since you are using Linux, set sincedb_path => "/dev/null". The param: sincedb_path=> "NUL" is for Windows. Your .conf looks simple for processing should be fast.
Can you provide more details:

how slow is, how many messages/event process per minute?
does the /opt/trabaja/csv/ directory have a lot of files?
what is file size?
you are deleting file after read, am I right?
why do you need stdout { }? It's debug and consume resources
any particular reason for retry_on_conflict => 0?
have you check with the API or in Kibana Stack monitoring which plugin consume the most resources

RobertC1 · October 16, 2023, 6:43pm

how slow is, how many messages/event process per minute?
60k per minute
does the /opt/trabaja/csv/ directory have a lot of files?
Yes, with different sizes, the less value 8MB the biggest 1.7GB
you are deleting file after read, am I right?
Yes the files are deteted.
why do you need stdout { }? It's debug and consume resources
I will take it out.
any particular reason for retry_on_conflict => 0?
None, I will delete the line
have you check with the API or in Kibana Stack monitoring which plugin consume the most resources
Could you give more hints about the kibana, which plugin do I have to use for monitoring

Thanks for your time and recommendations

RobertC1 · October 16, 2023, 8:49pm

Following your recommendatios the file.conf

input {
        file {
                path => "/opt/trabaja/csv/trx_hours*.csv"
                start_position => "beginning"
                sincedb_path=> "NULL"
                mode => "read"
                file_completed_action => "delete"
                file_sort_by => "path"
                exit_after_read => true
             }
        }
filter {
               csv { separator  => ";"
              columns => ["nodo","fecha","usuario","serviceid","sesid","base","sp","trn","ssn","tiempo","tiempo_sp","observacion"]}

mutate  { remove_field => [ "message", "@version","host","[log][file][path]" ] }
mutate { convert => [ "fecha", "string" ]  }
mutate { convert => [ "usuario", "string" ]  }
mutate { convert => [ "serviceid", "string" ]  }
mutate { convert => [ "sesid", "string" ]  }
mutate { convert => [ "base", "string" ]  }
mutate { convert => [ "sp", "string" ]  }
mutate { convert => [ "trn", "float" ]  }
mutate { convert => [ "ssn", "float" ]  }
mutate { convert => [ "tiempo", "float" ]  }
mutate { convert => [ "tiempo_sp", "float" ]  }
mutate { convert => [ "observacion", "string" ]  }
mutate { add_field => { "fecha_dia" => "%{fecha}" } }
date {
match => [ "fecha_dia", "yyyy-MM-dd HH:mm:ss.SSS" ,"ISO8601"]
timezone => "America/Argentina/Buenos_Aires"
target => "@timestamp"
}
mutate  { remove_field => [ "fecha_dia" ]}
}

output{
    elasticsearch {  hosts => ["localhost:9200"]
                     index => "trx_hours_new"
                     user => "elastic"
                     password => "Accusys123*"
                     action =>"index"       }
    }

Now is indexing 388.000 lines per minute.
I really appreciate you sharing your great experience.
Thanks

Rios · October 17, 2023, 8:18am

For more then hundred replays, this is for the first time someone clearly responded to my questions.
Dear Elastic Team Members, can you please provide a gift to him? A t-shirt, mug, pen, anything?

Back to the topic. The main cause of slow processing is debug. Your .conf is not complex. I have removed lines with conversion to string - default is string, no need AFAIK, and removed few lines and fecha_dia if you need only for date convers. Try also with the dissect plugin, maaaaaybe you will get a little bit more on performances. If you don't need the event field, you can removed.
In the file plugin, you should use:

Windows: sincedb_path => "NUL"
Linux: sincedb_path => "/dev/null", in your case
You can set real path and file in sincedb_path to track which files are processes. It's up to you.

input {
  file {
                path => "/opt/trabaja/csv/trx_hours*.csv"
                start_position => "beginning"
                sincedb_path => "/dev/null"
                mode => "read"
                file_completed_action => "delete"
                file_sort_by => "path"
                exit_after_read => true
  }
}
		
filter {
  csv { separator  => ";"
       columns => ["nodo","fecha","usuario","serviceid","sesid","base","sp","trn","ssn","tiempo","tiempo_sp","observacion"]
  }
  
  # dissect {
	# mapping => { "message" => "%{nodo};"%{fecha}";"%{usuario}";"%{serviceid}";"%{sesid}";"%{base}";"%{sp}";"%{trn}";"%{ssn}";"%{tiempo}";"%{tiempo_sp}";"%{observacion}"  }
  # }
   
mutate  { remove_field => [ "message", "host","log", "event" ] }

mutate { convert => [ "trn", "float" ]  }
mutate { convert => [ "ssn", "float" ]  }
mutate { convert => [ "tiempo", "float" ]  }
mutate { convert => [ "tiempo_sp", "float" ]  }

  date {
	match => [ "fecha", "yyyy-MM-dd HH:mm:ss.SSS" ,"ISO8601"]
	timezone => "America/Argentina/Buenos_Aires"
	# target => "@timestamp" # no need, @timestamp it's default destination field.
  }
}

If you like to use only the CSV plugin, you can do conversion inside csv.

csv { 
  separator  => ";"
  convert => {
          "trn" => "float"
          "ssn" => "float"
          "tiempo" => "float"
          "tiempo_sp" => "float" 
  }
  columns => [... 
}

LS can provide processing info. Here you can find more info.
For LS<8.x you can use monitoring integrated inside, add in logstash.yml:

xpack.monitoring.enabled: true
xpack.monitoring.collection.interval: 5s
xpack.monitoring.collection.pipeline.details.enabled: true
xpack.monitoring.elasticsearch.hosts: "localhost:9200"

For LS 8.x you should use Metricbeat and the logstash module to see info in Kibana. With Stack Monitoring enabled, even without Metricbeat you should see some metrics in Kibana. Also when LS is running you can use curl http://localhost:9600/_node/stats/pipelines?pretty to see perfomances in JSON format.

RobertC1 · October 18, 2023, 9:04pm

@Rios
Finally this is the .conf file.

input {
  file {
                path => "/opt/trabaja/csv/trx_hours*.csv"
                start_position => "beginning"
                sincedb_path => "NULL"
                mode => "read"
                file_completed_action => "delete"
                file_sort_by => "last_modified"
                exit_after_read => true
  }
}
filter {
  csv { separator  => ";"

convert => {
          "trn" => "float"
          "ssn" => "float"
          "tiempo" => "float"
          "tiempo_sp" => "float"
}
       columns => ["nodo","fecha","usuario","serviceid","sesid","base","sp","trn","ssn","tiempo","tiempo_sp","observacion"]
  }

mutate  { remove_field => [ "message", "@version","host","[log][file][path]" ] }

date {
   match => [ "fecha_dia", "yyyy-MM-dd HH:mm:ss.SSS" ,"ISO8601"]
   timezone => "America/Argentina/Buenos_Aires"
   target => "@timestamp"
}
mutate  { remove_field => [ "fecha_dia" ]}
}
output{
    elasticsearch {  hosts => ["localhost:9200"]
                     index => "trx_hours_new"
                     user => "elastic"
                     password => "Accusys123*"
                     action =>"index"       }
    }

Now is indexing 800K per minute.
I could not not take away this section

date {
   match => [ "fecha_dia", "yyyy-MM-dd HH:mm:ss.SSS" ,"ISO8601"]
   timezone => "America/Argentina/Buenos_Aires"
   target => "@timestamp"
}
mutate  { remove_field => [ "fecha_dia" ]}
}

Because @timestamp takes the actual date when is running.
Nevertheless, it was a very good help
Thanks a lot!!
P.S. I am waiting for my gift.

Rios · October 20, 2023, 7:15am

I cannot see where was copied fecha to fecha_dia in this .conf

Also, you can set without coping:

date {
   match => [ "fecha", "yyyy-MM-dd HH:mm:ss.SSS" ,"ISO8601"]
   timezone => "America/Argentina/Buenos_Aires"
   target => "@timestamp"
}

RobertC1 · October 20, 2023, 10:03am

@Rios
Yes I missed that line in th code; but was there.

Thank you very much for all the suggestions.

RobertC1 · October 25, 2023, 2:24pm

@Rios
You were right, there was not need to create the field fecha_dia to get the right datetime.

date {
   match => [ "fecha", "yyyy-MM-dd HH:mm:ss.SSS" ,"ISO8601"]
   timezone => "America/Argentina/Buenos_Aires"
   target => "@timestamp"
}

I already changed all the .conf.
Best regards.

system · November 22, 2023, 2:24pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Fastest way to ingest CSV's with logstash to elasticsearch Logstash	9	390	June 8, 2023
Ingestion Rate decreases over time Logstash	0	16	October 20, 2024
Help Needed in improving the data ingestion time Logstash	5	743	December 13, 2017
Logstash is very slow in sending the data to elasticsearch Logstash	18	13804	February 21, 2017
Ingest CSV file with Logstash fails Logstash	2	360	June 16, 2021

Logstash 8.6 low performance

Related Topics