How logstash read input file?


(Henri) #1

I wonder if Logstash read the file (csv for example) line per line, and one line after the previous one? and if not how can I be sure logstash parse the file depending of the line number order....

I ask this because in some of my files I need to make update related to the order (line 1 update the counter, line 2 update counter again and other field too) and I notice that logstatsh doesn't work as expected with my file and conf file.

Thanks for your feedback


(Magnus Bäck) #2

I wonder if Logstash read the file (csv for example) line per line, and one line after the previous one? and if not how can I be sure logstash parse the file depending of the line number order....

Files are read in sequential order but because there typically are multiple pipeline workers running events aren't necessarily processed in the order they're listed in the file.

What does your configuration file look like?


(Henri) #3

Thanks Magnus,

Here is my configuration

input {
    file {
        path => "/home/1234/public_html/datas/sirene_update/*.csv"
        start_position => beginning
		sincedb_path => "/home/1234/public_html/datas/sirene_update/.sincedb*"	
	}
}
filter {
	
    csv {
        columns => [
			"SIREN","NIC","L1_NORMALISEE","L2_NORMALISEE","L3_NORMALISEE","L4_NORMALISEE","L5_NORMALISEE","L6_NORMALISEE","L7_NORMALISEE"
	    ]
        separator => ","
		skip_empty_columns => true
    }
	mutate {
		add_field => {
		  "SIRENTVA" => "%{SIREN}"
		}
	}
	mutate {
		convert => { "SIRENTVA" => "integer" }
	}  
	ruby {
		code =>  "
			event.set('TVA_id', (( 12 + 3 * ( event.get('SIRENTVA') % 97 ) ) % 97 ))
			"
	}	
	mutate {
		add_field => {
		  "provider" => "sirene"
		  "SIRET" => "%{SIREN}%{NIC}"
		}
		remove_field => [ "SIRENTVA" ]
		remove_field => [ "TVA_id" ]
		}
	mutate {
		  remove_field => [ "message", "host", "@version", "path" ]
		}
}
output {
    if [SIEGE] != "1" {
		elasticsearch {
			hosts => "http://localhost:9200"
			index => "sirene"
			document_id => "%{SIREN}%{NIC}"
			timeout => 30
			workers => 1
			doc_as_upsert => true
			action => "update"
		}
	} else {
		elasticsearch {
			hosts => "http://localhost:9200"
			index => "sirene"
			document_id => "%{SIREN}"
			timeout => 30
			workers => 1
			doc_as_upsert => true
			action => "update"
		}
	}
	stdout { codec => rubydebug }
}

And one file is added each day to the rep /home/1234/public_html/datas/sirene_update/

The order of the data inside the csv is important and I need that logstash process it one line after the previous one !
Thanks again for your inputs


(Magnus Bäck) #4

Then restrict the number of pipeline workers to one, either via the command line option or the appropriate line in logstash.yml.


(Henri) #5

Is it pipeline.workers or pipeline.output.workers that must been set to 1 ?
Also I can"t find the command line to set the pipeline worker to 1, so that the other logstash still work in paralel, it's may be a better option for me the command line setting

Thanks


(Magnus Bäck) #6

Most likely both, but the latter already defaults to 1.


(Henri) #7

Thanks very clear!

Do you know where I can find the command line to set this for each logstash instance instead of in the main configuration file?


(Henri) #8

Will it be something like that

/usr/share/logstash/bin/logstash -f /home/1234/public_html/datas/sirene_update.conf --path.data home/1234/public_html/datas/logstash/sirene/ --pipeline.workers 1

Do I need to put the -w?


(Magnus Bäck) #9

I think -w is an alias for --pipeline.workers. Try it out. I think Logstash logs the number of pipeline workers.


(Henri) #10

Indeed it look like to be an alias! I will test that!


(system) #11

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.