Logstash 6.2.4 read_to_eof: no delimiter found in current chunk


#1

Hi guys,

I am using the new file input plugin 4.1.1 and keep getting the below warn message churning out numerous times - what is it about?

[WARN ][filewatch.tailmode.handlers.grow] read_to_eof: no delimiter found in current chunk

Cheers,


File input v4.1.2 tailing - duplicate docs ingested
(Guy Boertje) #2

That error has just been fixed today. update to v4.1.2. Apologies.


#3

Thanks for the quick reply.

One question unrelated to the topic (maybe I should create another thread) but is there any problem in 4.1.1 that causes the same log events to be sent causing multiple duplicate docs in elasticsearch? Because I am facing this issue after upgrade where I see numerous duplicate docs.


(Guy Boertje) #4

The error caused the same piece of content to continually be reprocessed - I assume that is where the duplicates come from.

Are you using /dev/null for your sincedb path?

I suggest you start from scratch rereading the files to a new index and then check for duplicates. If you see duplicates after this please open a new topic "File input v4.1.2 tailing - duplicate docs ingested"


#5

Let me open a new topic


(Guy Boertje) #6

No. Create a new topic only after you confirm that duplicates are seen with the new version. v4.1.2.


#7

Ah, ok, roger.

P.S.: my sincedb is not /dev/null and I've started from scratch.


#8

Related to the main topic - I'm getting this now after upgrading to 4.1.2. Should I take any action or is this some bug?

[2018-05-04T09:52:50,706][INFO ][filewatch.tailmode.handlers.grow] buffer_extract: a delimiter can't be found in current chunk, maybe there are no more delimiters or the delimiter is incorrect or the text before the delimiter, a 'line', is very large, if this message is logged often try increasing the `file_chunk_size` setting. {"delimiter"=>"\n", "read_position"=>827326, "bytes_read_count"=>66, "last_known_file_size"=>827392, "file_path"=>"/foo/bar/my.log"}

FYI, I got this error while doing some testing in relation to this topic: Logstash 6.2.1 Big .since-db file causes OutOfMemory - and added a reply there.


(Guy Boertje) #9

@ld_pvl

This one is relevant and informative but whether it is important depends on the way the tailed files are being filled.

I'll break it down.

  • filewatch.tailmode.handlers.grow - this is the code that executes when a file is seen to have grown from last time in tail mode.
  • "delimiter"=>"\n" - standard newline delimiter.
  • "read_position"=>827326 - the offset in the file that the "chunk" was read from.
  • "bytes_read_count"=>66 - the number of bytes in the "chunk" normally this is 32768 (32K). As it is smaller than 32K we are reading the last few bytes of the file at the present time.
  • "last_known_file_size"=>827392 - the size of the file when we last checked (maybe 1 second, the scan_interval ago)

827326 + 66 = 827392 - yep, we read to the end of the file as we saw it at that time.

Interpretation:

  1. The system will be writing the rest of the line later. There was no newline character in those 66 bytes because the system writing the file may not have written/flushed the rest of the line and the closing newline yet. If this is the case then we can ignore this message, it is INFO after all.
  2. The system will not be writing the rest of the line later. If the system never writes the rest of the line plus the newline then this message is important. It means that those 66 bytes are lost, they are stuck in the buffer (each file has its own buffer) and there is no more content to unstick it. Remember, that tailing is really an endless stream of content so there is no way of knowing that the buffer can be artificially flushed.

Read mode to the rescue. In read mode, we assume that the file is a fixed length stream and in this case we can artificially flush the buffer when we reach the end of the stream.

In 4.1.2 we have the limitation that discovered files should not be growing while (or after) we read the file, the additional content may not be read. This means that you should do an atomic write or copy - meaning write or copy to a folder outside of the path glob and then rename the file so it becomes discoverable. We have plans to fix this.

Please tell me whether interpretation 1 or 2 applies to your case.


#10

Thanks a lot


(Guy Boertje) #11

See this post for continuation.


(system) #12

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.