[solved] [accuracy] randomly missing events

Hi all,

I'm wondering whether I can use logstash for ensuring accuracy during log collection (i.e. that no events will be skipped during input or output).

For instance, let's consider the following configuration:

input {
	file {
		type => "test"
		path => "/root/development/_in/*.log"
		start_position => "beginning"
	}
}

output {
	file {
		path => "./_out/test.log"
	}
}

Now, if I initiate logstash instance and then I copy-paste a 50k-log file in the "_in" directory (or let a process generate it), I would expect to see all the 50k events/lines in the "test.log" output file. Instead, what I get is a varying-length output (i.e. I could be missing 2 or 3 or 10 or 20 or even none events).

I wonder if this has to do with the pipeline's throttling. So far I have tried the following without any luck:

  • Running on both Windows and Linux environments
  • Trying logstash versions 1.5.2, 1.5.5 and 2.0.0
  • Increasing heap size to 1g
  • Instead of writing output to file, sending them over the network with lumberjack

Please see below the structure of an exemplary input file.

Thanks in advance for any reply.

V.

I'm not sure if I have the write answer or your process but you may need a sincedb_path:

input {
file {
	type => "test"
	path => "/root/development/_in/*.log"
	start_position => "beginning"
            sincedb_path => "PATHNAME"
}
}

If you are running and adding and rerunning the file you need to delete that sincedb file it creates everytime. Hopefully this helps!!!

Hi Jackal,

Thanks for your reply.

Unfortunately, the reported issue is not with files or new lines that were not discovered, but I gave it a try anyway, without any luck: for instance I generated a new file, logstash identified it, but still some lines where (randomly) missing at the output, without any (obvious) reasons when I enabled debug.

Hopefully, this can be reproduced easily:

  1. Lauch logstash with the aforementioned configuration
  2. Execute the following command to generate input for logstash:
    for i in {1..50000}; do echo "This, is, a, csv, and, this, is, line, no, $i"; done > _in/testme.log
  3. Wait until logstash picks up the file (I think this is configurable by the stat interval option)
  4. Notice by counting the out put (e.g. wc -l _out/test.log) the result is not 50000 lines (e.g. it could be 49994 or any other randomly changing count on each run).

Thanks, V.

Hello,

The "issue" I encountered is actually occurring due to the caching feature of the pipeline.

By configuring the logstash file output plugin with the flush_interval => 0 option flushing occurs for every message.

See more at: https://www.elastic.co/guide/en/logstash/current/plugins-outputs-file.html#plugins-outputs-file-flush_interval

Thanks, V.

1 Like