Logstash slow in bash script launch by cronjob

Hi I have a bash script like that:

#!/bin/bash

cat /home/xxx/public_html/datas/zz/file.csv | /usr/share/logstash/bin/logstash -f /home/xxx/public_html/datas/myconf.conf

My conf file is basic like that:

input {
    stdin {}
}
filter {
		csv {
			columns => [
			"id","link","title","description","hashdesc","publicationDate","geo_lat","geo_lng","city","postalCode","department","region","sector","jobtitle","company","contractType","salary","affiliateCpc"
			]
			separator => ","
			skip_empty_columns => false
		}
      mutate {
        add_field => {
          "[pin][location][lat]" => "%{geo_lat}"
				  "[pin][location][lon]" => "%{geo_lng}"
        }
      }
			mutate {
			  convert => {
				"[pin][location][lat]" => "float"
				"[pin][location][lon]" => "float"
        "[cpc]" => "float"
			  }
			}
		mutate {
		  remove_field => [ "message", "host", "@version", "path", "geo_lng", "geo_lat", "event", "log" ]
		}

}
output {
		elasticsearch {
			hosts => "http://localhost:9200"
			index => "jobs3"
			document_id => "%{guid}"
			timeout => 30
			workers => 1
			doc_as_upsert => true
			action => "update"
		}
	stdout { codec => rubydebug }
}

This work perfectly with the shell but when I try to call this script in a cronjob it take a lot of time....

Any idea to speed up the cronjob like in the shell?

Thanks

How much is a lot of time in this case? And what is your cron schedule?

Logstash can take a couple of time to start, this is expected, if this is an issue you should try to run logstash as a service and change your pipeline to read from files.

With the shell it take 5 minutes and with cron one hour and 30 min

Cron schedule each 4 hours

My cron is call like that

bin/bash /home/xxx/public_html/datas/elasticsearch/jobs.sh

What version are you using and what is the server specs?

A start up time of 5 minutes is too long, but it could happen in some versions depending on the specs, there is an old issue about it related to jruby and the entropy of the system, but I did not experience this behavior in new versions.

But 1 hour and 30 minutes does not make much sense.

How are you measuring this time? Can you share logstash logs when it starts?

No the start up of logstash take 5 seconds only, 5 minutes if for the script with 600000 new entries.
I monitor the cron time easily I check the start and when the entries are in Elasticsearch... So 1h30 is approximate but significantly more than 5 minutes with the run in console.

Maybe losgstah is depriorize by cron, I have no idea

I found the problem. It was the output in cron the bottleneck
stdout { codec => rubydebug }

I simply remove the stdout and cron works like the shell way

Thanks for your help

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.