Importing the same file

Hi,

I just have a conceptual question about how logstash import same files.

Let's say I have done with the previous data ingesting. I terminated the process of logstash after that.And I do not change the data source file or corresponding config file. They are exactly same as before. If I restart the logstash with the same config setting, would the data be replaced by the new one(even if they are same) or logstash automatically recognizes they are same so it will not import any new data?

Logstash's file input keeps track of the current position in each file so that it can continue where it left off. See the file input plugin documentation for details.

I am not really sure what the documentation says.

Here is my input setting, could you help me take a look at it.

My goal is to make it continue where it left off every time I start the logstash with that config.

input {
	file {
		path => "/Users/apple/Desktop/ref_client/events/event.log"
		start_position => "beginning"
		sincedb_path => "/dev/null"
		codec => json
		

	}
}

Thanks.

Your sincedb_path setting effectively disables the tracking of the current position. Drop it and you'll be fine.

What about "start_position", do I need to change it into "end"?

No. That option controls what happens when Logstash discovers a new file that matches the configured filename pattern.

If for the scenario that I have my initial json file which has 5 data items. So when I first import this file to ES, it is read from the beginning. Later on, my file is updated and there are 2 more data items added to the file. And I want to import this file again. Do I need to change my start_postion to "end" when the second importing with the updated file?

No, Logstash will pick up the two added items.

I am assuming that the items are added by appending to the original file. If the file is rewritten from scratch with a new inode number Logstash will lose track.

So the final version would be like this no matter if it is the first time ingesting or not, right?

input {
file {
path => "/Users/apple/Desktop/ref_client/events/event.log"
start_position => "beginning"
codec => json

Yes.

I tried this config, but the result seems there has duplicated data items.

Alright, my current situation is I commented off "sincedb_path" line and imported the file with the same name but updated version. I think the read position was messed up somehow. What I want to do is get rid of all previous operations and start a new ingesting with the feature of recording the last read position. I already deleted the previous data in ES, but I think for that specific file, the last read position was recorded somewhere cuz I ever commented off that line. So if I keep my config to achieve restarting from the beginning, it would not work since no new data come in. I am wondering if there is any way for me to restart everything then?

Delete the sincedb file(s) and you'll reset Logstash's idea of the current position in the file. I believe the file input documentation describes where those files are stored.

Thanks.
Btw, is it possible for me to create my own empty txt file for sincedb_path file?

Maybe, I'm not sure how picky the code is. Why would you want to do that?

Idk either...

I just checked the document and go into the folder that is supposed to have the sincedb_path file. But nothing is there...I did not specify my own text file...

The default will write sincedb files to <path.data>/plugins/inputs/file NOTE: it must be a file path and not a directory path

Make sure you display hidden files (i.e. their names begin with a period). If you increase the logging verbosity Logstash will tell you the name of sincedb files used.

Yeah, it is hidden in that folder. Thank you so much!!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.