How logstash read input file?

henrilabarre · June 6, 2017, 7:02am

I wonder if Logstash read the file (csv for example) line per line, and one line after the previous one? and if not how can I be sure logstash parse the file depending of the line number order....

I ask this because in some of my files I need to make update related to the order (line 1 update the counter, line 2 update counter again and other field too) and I notice that logstatsh doesn't work as expected with my file and conf file.

Thanks for your feedback

magnusbaeck · June 7, 2017, 6:59am

I wonder if Logstash read the file (csv for example) line per line, and one line after the previous one? and if not how can I be sure logstash parse the file depending of the line number order....

Files are read in sequential order but because there typically are multiple pipeline workers running events aren't necessarily processed in the order they're listed in the file.

What does your configuration file look like?

henrilabarre · June 7, 2017, 7:47am

Thanks Magnus,

Here is my configuration

input {
    file {
        path => "/home/1234/public_html/datas/sirene_update/*.csv"
        start_position => beginning
		sincedb_path => "/home/1234/public_html/datas/sirene_update/.sincedb*"	
	}
}
filter {
	
    csv {
        columns => [
			"SIREN","NIC","L1_NORMALISEE","L2_NORMALISEE","L3_NORMALISEE","L4_NORMALISEE","L5_NORMALISEE","L6_NORMALISEE","L7_NORMALISEE"
	    ]
        separator => ","
		skip_empty_columns => true
    }
	mutate {
		add_field => {
		  "SIRENTVA" => "%{SIREN}"
		}
	}
	mutate {
		convert => { "SIRENTVA" => "integer" }
	}  
	ruby {
		code =>  "
			event.set('TVA_id', (( 12 + 3 * ( event.get('SIRENTVA') % 97 ) ) % 97 ))
			"
	}	
	mutate {
		add_field => {
		  "provider" => "sirene"
		  "SIRET" => "%{SIREN}%{NIC}"
		}
		remove_field => [ "SIRENTVA" ]
		remove_field => [ "TVA_id" ]
		}
	mutate {
		  remove_field => [ "message", "host", "@version", "path" ]
		}
}
output {
    if [SIEGE] != "1" {
		elasticsearch {
			hosts => "http://localhost:9200"
			index => "sirene"
			document_id => "%{SIREN}%{NIC}"
			timeout => 30
			workers => 1
			doc_as_upsert => true
			action => "update"
		}
	} else {
		elasticsearch {
			hosts => "http://localhost:9200"
			index => "sirene"
			document_id => "%{SIREN}"
			timeout => 30
			workers => 1
			doc_as_upsert => true
			action => "update"
		}
	}
	stdout { codec => rubydebug }
}

And one file is added each day to the rep /home/1234/public_html/datas/sirene_update/

The order of the data inside the csv is important and I need that logstash process it one line after the previous one !
Thanks again for your inputs

magnusbaeck · June 7, 2017, 9:43am

Then restrict the number of pipeline workers to one, either via the command line option or the appropriate line in logstash.yml.

henrilabarre · June 7, 2017, 9:48am

Is it pipeline.workers or pipeline.output.workers that must been set to 1 ?
Also I can"t find the command line to set the pipeline worker to 1, so that the other logstash still work in paralel, it's may be a better option for me the command line setting

Thanks

magnusbaeck · June 7, 2017, 9:50am

Most likely both, but the latter already defaults to 1.

henrilabarre · June 7, 2017, 9:52am

Thanks very clear!

Do you know where I can find the command line to set this for each logstash instance instead of in the main configuration file?

henrilabarre · June 7, 2017, 9:57am

Will it be something like that

/usr/share/logstash/bin/logstash -f /home/1234/public_html/datas/sirene_update.conf --path.data home/1234/public_html/datas/logstash/sirene/ --pipeline.workers 1

Do I need to put the -w?

magnusbaeck · June 7, 2017, 10:17am

I think -w is an alias for --pipeline.workers. Try it out. I think Logstash logs the number of pipeline workers.

henrilabarre · June 7, 2017, 10:18am

Indeed it look like to be an alias! I will test that!

system · July 5, 2017, 10:19am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash output plugin not writing data sequentially Logstash	5	1370	September 27, 2018
Logstash reading the file line multiple times Logstash	2	582	July 23, 2019
Logstash read file in what order? Logstash	22	1931	July 1, 2021
How to process messages strictly in the order they arrive? Logstash	5	5086	July 6, 2017
Logstash not reading the log data in correct sequence Logstash	4	1167	March 9, 2018

How logstash read input file?

Related topics