How Logstash works for 10,000 lines of log file

I have log file that has 10,000 lines of logs, every hour 10 lines of logs will be added in the log file.

at 3 AM I have 10,000 line of logs
at 4 AM 10,010 lines of logs
at 5 AM 10,020 lines of logs....

I would like to know in the above scenario how logstash will read my log file.

my understanding about logstash,
It will read all 10,000 lines at 3 AM and push data into Elastic search
at 4 AM, read all 10,000 lines and compare the data
if logs already in Elastic index it will skip,
if logs are not in Elastic index (that new 10 lines), will insert data into my Elastic index.

It would be nice if anyone explain me in this scenario.
Thanks in advance.

Hi,

Your assumption is wrong here. Logstash keeps track of where it left of reading the log file. As the data is flowing into you log file logstash will start reading data and flush every 5 secs to Elasticsearch (or whatever output you configure). If there is no data between 3 and 4 am logstash will just sit there and wait for data to come in.

Here is the doc for the file input plugin which you use in logstash.

Hope this helps.

Paul.

1 Like

Hi @pjanzen , thanks for your input.

In the above scenario itself I have another 2 doubts.

Doubt_1:
Day 1: I have 10,020 lines of logs,
Day 2 - 01:00 AM I will rotate my log file, it will start from 0 th line.

in this scenario how Logstash will work.

Doubt_2:
when I have 10,030 lines of logs in my file, If I stopped my Logstash server and run it again, will it capture my log files from 0 to 10,020 and re-write into Elastic search index again.
I mean while first run I have ingested 10,020 documents in ES, after stopping and run again, will it ingest 20,040 documents.

Please bear with me, It could be stupid question but I would like to know.

Thanks in advance.

The file input tracks state in the sincedb. On UNIX it tracks the progress it has made reading each file using the device numbers and inode number. On Windows it uses something similar.

If you ask the file input to tail foo.log then when data is added to the file it will read just the new data. If foo.log is rotated to foo.log.1 and a new foo.log is created then the file input will see it as a new file and start over at the beginning.

Using the inode number to track the file can break. For example, on some filesystems, if you delete foo.log and immediately create a new one it will re-use the same inode number. logstash will then think it has already read bytes from the file and ignore data added to the new foo.log until it is longer than the previous foo.log.

This is fundamentally a very hard and expensive problem, and the file input uses a cheap shortcut that is usually right. Sometimes it is wrong.

2 Likes

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.