Reload the same file from the beginning without restarting logstash


(Mehdi AOUADI) #1

I managed to force Logstash reloading the whole file by pointing the sincedb_path to NUL (Windows environment) and setting the start_position at the beginning. Here is my file input configuration:

input {
     file {
        path => "myfile.csv"
        start_position => beginning
        ignore_older => 0
        type => "my_document_type"
        sincedb_path => "NUL"
        stat_interval => 1
    }
}

The file is actually reloaded every time I restart Logstash and every time it is modified, but I want it to be reloaded each second as mentioned in stat_interval.
I also need it to be reloaded even if there is no modification and without restarting logstash because I am adding a date based field in the filters and I need the same data every day with an updated date_field :

filter {
    csv {
        columns => ["MyFirstColumn", "MySecondColumn"]
        separator => ";"
        add_field => {
        "date_field" => "%{+ddMMyyy}"
        }
    }
}  

Here is an example of the expected behavior :

File content :

Column A;Column B
Value X;Value Y  

Data sent to Elastic search index :

Column A : Value X, Column B : Value Y, date_field : 05122016

The day after, even without modifying the file I want the following data to be added to the same index in Elasticsearch :

Column A : Value X, Column B : Value Y, date_field : 06122016

(Magnus Bäck) #2

I don't think the file input is the best fit here. Why not use the exec plugin instead?


(Mehdi AOUADI) #3

Thanks for the tip. I tried using a cat command on the file, the problem now is that it loads the content of the whole file as a single block. Here is an example of the file content :

myFirstColumn;mySecondColumn;mythirdColumn
valueA;ValueB;ValueC
Value1;Value2;Value3
ValueX;ValueY;ValueZ

And here is my config :

input {
	exec {
		command => "cat myfile.csv"
		interval => 2
		add_field => {
			  "tag" => "mytag"
		}
	}
}
filter {

	if [tag] == "mytag" {
		csv {
			columns => ["myFirstColumn", "mySecondColumn", "mythirdColumn"]
			separator => ";"		
		}
}  

It sends the whole content of the file without splitting it content. Is something missing ?


(Magnus Bäck) #4

No, that's expected. You can a split filter to split events on newlines.


(Mehdi AOUADI) #6

I added a split filter before the csv one and it now working well.
To sumarize, here is my config file :

input {
	exec {
		command => "cat myfile.csv"
		interval => 2
		add_field => {
			  "tag" => "mytag"
		}
	}
}
filter {

	if [tag] == "mytag" {
                split {
                        terminator => "\n"
                }
		csv {
			columns => ["myFirstColumn", "mySecondColumn", "mythirdColumn"]
			separator => ";"		
		}
}
output {
	if [tag] == "mytag" {
		elasticsearch {
			hosts => [ "localhost:9200" ]
			index => "myIndex"
                }
	}
} 

Thank you for your help :blush:


(system) #7

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.