[solved] [accuracy] randomly missing events

vgm · November 17, 2015, 2:29pm

Hi all,

I'm wondering whether I can use logstash for ensuring accuracy during log collection (i.e. that no events will be skipped during input or output).

For instance, let's consider the following configuration:

input {
	file {
		type => "test"
		path => "/root/development/_in/*.log"
		start_position => "beginning"
	}
}

output {
	file {
		path => "./_out/test.log"
	}
}

Now, if I initiate logstash instance and then I copy-paste a 50k-log file in the "_in" directory (or let a process generate it), I would expect to see all the 50k events/lines in the "test.log" output file. Instead, what I get is a varying-length output (i.e. I could be missing 2 or 3 or 10 or 20 or even none events).

I wonder if this has to do with the pipeline's throttling. So far I have tried the following without any luck:

Running on both Windows and Linux environments
Trying logstash versions 1.5.2, 1.5.5 and 2.0.0
Increasing heap size to 1g
Instead of writing output to file, sending them over the network with lumberjack

Please see below the structure of an exemplary input file.

Thanks in advance for any reply.

V.

Jackal9301 · November 17, 2015, 5:35pm

I'm not sure if I have the write answer or your process but you may need a sincedb_path:

input {
file {
	type => "test"
	path => "/root/development/_in/*.log"
	start_position => "beginning"
            sincedb_path => "PATHNAME"
}
}

If you are running and adding and rerunning the file you need to delete that sincedb file it creates everytime. Hopefully this helps!!!

vgm · November 18, 2015, 12:11pm

Hi Jackal,

Thanks for your reply.

Unfortunately, the reported issue is not with files or new lines that were not discovered, but I gave it a try anyway, without any luck: for instance I generated a new file, logstash identified it, but still some lines where (randomly) missing at the output, without any (obvious) reasons when I enabled debug.

Hopefully, this can be reproduced easily:

Lauch logstash with the aforementioned configuration
Execute the following command to generate input for logstash:
for i in {1..50000}; do echo "This, is, a, csv, and, this, is, line, no, $i"; done > _in/testme.log
Wait until logstash picks up the file (I think this is configurable by the stat interval option)
Notice by counting the out put (e.g. wc -l _out/test.log) the result is not 50000 lines (e.g. it could be 49994 or any other randomly changing count on each run).

Thanks, V.

vgm · November 23, 2015, 9:46am

Hello,

The "issue" I encountered is actually occurring due to the caching feature of the pipeline.

By configuring the logstash file output plugin with the flush_interval => 0 option flushing occurs for every message.

See more at: https://www.elastic.co/guide/en/logstash/current/plugins-outputs-file.html#plugins-outputs-file-flush_interval

Thanks, V.

Topic		Replies	Views
How to find missing logs in Logstash? Logstash	1	1376	August 3, 2018
Logstash is only partially reading input files Logstash	5	1858	August 7, 2017
Logstash Input Pluging Misses Events Logstash	1	304	July 16, 2018
Logstash pipeline: events get triplicated Logstash	1	685	July 6, 2017
Randomly missing events Logstash	6	575	March 20, 2022

[solved] [accuracy] randomly missing events

Related topics