Hi guys,
I am using the new file input plugin 4.1.1 and keep getting the below warn message churning out numerous times - what is it about?
[WARN ][filewatch.tailmode.handlers.grow] read_to_eof: no delimiter found in current chunk
Cheers,
Hi guys,
I am using the new file input plugin 4.1.1 and keep getting the below warn message churning out numerous times - what is it about?
[WARN ][filewatch.tailmode.handlers.grow] read_to_eof: no delimiter found in current chunk
Cheers,
That error has just been fixed today. update to v4.1.2. Apologies.
Thanks for the quick reply.
One question unrelated to the topic (maybe I should create another thread) but is there any problem in 4.1.1 that causes the same log events to be sent causing multiple duplicate docs in elasticsearch? Because I am facing this issue after upgrade where I see numerous duplicate docs.
The error caused the same piece of content to continually be reprocessed - I assume that is where the duplicates come from.
Are you using /dev/null
for your sincedb path?
I suggest you start from scratch rereading the files to a new index and then check for duplicates. If you see duplicates after this please open a new topic "File input v4.1.2 tailing - duplicate docs ingested"
Let me open a new topic
No. Create a new topic only after you confirm that duplicates are seen with the new version. v4.1.2.
Ah, ok, roger.
P.S.: my sincedb is not /dev/null
and I've started from scratch.
Related to the main topic - I'm getting this now after upgrading to 4.1.2. Should I take any action or is this some bug?
[2018-05-04T09:52:50,706][INFO ][filewatch.tailmode.handlers.grow] buffer_extract: a delimiter can't be found in current chunk, maybe there are no more delimiters or the delimiter is incorrect or the text before the delimiter, a 'line', is very large, if this message is logged often try increasing the `file_chunk_size` setting. {"delimiter"=>"\n", "read_position"=>827326, "bytes_read_count"=>66, "last_known_file_size"=>827392, "file_path"=>"/foo/bar/my.log"}
FYI, I got this error while doing some testing in relation to this topic: Logstash 6.2.1 Big .since-db file causes OutOfMemory - and added a reply there.
This one is relevant and informative but whether it is important depends on the way the tailed files are being filled.
I'll break it down.
filewatch.tailmode.handlers.grow
- this is the code that executes when a file is seen to have grown from last time in tail mode."delimiter"=>"\n"
- standard newline delimiter."read_position"=>827326
- the offset in the file that the "chunk" was read from."bytes_read_count"=>66
- the number of bytes in the "chunk" normally this is 32768 (32K). As it is smaller than 32K we are reading the last few bytes of the file at the present time."last_known_file_size"=>827392
- the size of the file when we last checked (maybe 1 second, the scan_interval
ago)827326 + 66 = 827392 - yep, we read to the end of the file as we saw it at that time.
Interpretation:
Read mode to the rescue. In read mode, we assume that the file is a fixed length stream and in this case we can artificially flush the buffer when we reach the end of the stream.
In 4.1.2 we have the limitation that discovered files should not be growing while (or after) we read the file, the additional content may not be read. This means that you should do an atomic write or copy - meaning write or copy to a folder outside of the path
glob and then rename the file so it becomes discoverable. We have plans to fix this.
Please tell me whether interpretation 1 or 2 applies to your case.
Thanks a lot
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.
© 2020. All Rights Reserved - Elasticsearch
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant logo are trademarks of the Apache Software Foundation in the United States and/or other countries.