I just have a conceptual question about how logstash import same files.
Let's say I have done with the previous data ingesting. I terminated the process of logstash after that.And I do not change the data source file or corresponding config file. They are exactly same as before. If I restart the logstash with the same config setting, would the data be replaced by the new one(even if they are same) or logstash automatically recognizes they are same so it will not import any new data?
Logstash's file input keeps track of the current position in each file so that it can continue where it left off. See the file input plugin documentation for details.
If for the scenario that I have my initial json file which has 5 data items. So when I first import this file to ES, it is read from the beginning. Later on, my file is updated and there are 2 more data items added to the file. And I want to import this file again. Do I need to change my start_postion to "end" when the second importing with the updated file?
I am assuming that the items are added by appending to the original file. If the file is rewritten from scratch with a new inode number Logstash will lose track.
Alright, my current situation is I commented off "sincedb_path" line and imported the file with the same name but updated version. I think the read position was messed up somehow. What I want to do is get rid of all previous operations and start a new ingesting with the feature of recording the last read position. I already deleted the previous data in ES, but I think for that specific file, the last read position was recorded somewhere cuz I ever commented off that line. So if I keep my config to achieve restarting from the beginning, it would not work since no new data come in. I am wondering if there is any way for me to restart everything then?
Delete the sincedb file(s) and you'll reset Logstash's idea of the current position in the file. I believe the file input documentation describes where those files are stored.
I just checked the document and go into the folder that is supposed to have the sincedb_path file. But nothing is there...I did not specify my own text file...
Make sure you display hidden files (i.e. their names begin with a period). If you increase the logging verbosity Logstash will tell you the name of sincedb files used.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.