I spent a lot of time on this issue until I found the problem:
Like many people who start using logstash, I have created 1 demo input file with 1 line of data.
When I used stdin() for the same input line, everything worked great. but when tried to read same input line from file, logstash didn't read the data.
I think it's because the single line ends with "EOF" and there is no "\n". so only when added a new empty line at the end logstash was able to read the data.
Is it a known issue? If i'm right with my assumption with the "\n", it means that many people might lose their last line of data unless it has an empty line at the end.
Yes, the file input requires each line to end with a newline character because that's what sane log files look like. Feel free to file a GitHub issue for this. While the plain codec uses the newline (or whatever delimiter you use) as the signal to emit an event, it should be possible to optionally emit an event after a certain time has passed. Applications usually don't write an incomplete line, wait for five minutes, and then write the rest.
@magnusbaeck - small correction - in its current form, in the file input the "buffer until a newline character" is part of the underlying filewatch library, meaning that the file input code does not even receive any data until a newline is seen.
Therefore, its impossible for the file input or its codec to emit an event with any data thus far received.
FWIW - after next LS release, the filewatch will send chunks (16K or 32K) of data to the file input. Or put another way, all data sources (filewatch, tcp servers, stdin channel) will behave in the same way i.e. sending chunks to its input.
@itaydvir As Magnus said, sane log files do have a newline character at the end of each line.
Specifically, though, because the file input is designed for tailing, the fact that an EOF is detected cannot be interpreted as end-of-all-input.
The LS file input as it is now, has no option to set that lets the user indicate that EOF means end of all input and even if there was it could not be the default, meaning that the user would have to read the docs to understand when to use it - and in reading the docs one would see "By default, each event is assumed to be one line." as the first written line (characters before a full stop character) of the second paragraph. I'll admit that it does not actually state that a line is defined as all the characters before a newline character. This is because the file input is used to read variable length lines not fixed length records and the end of variable length lines are terminated by a newline character, see this.
Ok @guyboertje, I agree about the definition and that my log file wasn't sane.
But I would be happy to receive some sort of indication about it (something like: "0 lines read so far separated by delimiter \n"). When I used --debug I still didn't get any hint for my problem. I didn't know If it read the line or not or if there were any other issue.
Maybe there is nothing much to do since you can't predict if there is a line of data or not, but you guys might have an idea how to give the user a hint about this issue.
Anyway, I see that you are going to re-design the whole input mechanism here which should solve this issue.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.