Logstash is reindexing lastline and newline

Hello community.
I am using logstash to parse a logfile and index to elastic
I have seen that if I enter a new line in the file, logstash is indexing the newline but also the last indexed line
Looks like a bug an it comes with release 6.5.1, 6.6.1 and 7.0.0
Can you help please?

What does your configuration look like?

This is my logstash.config file

input {
file {
path => "/testfile.txt"
start_position => "end"
sincedb_path => "/temp/sincedb_txt"
}
}
filter {
grok {
patterns_dir => ["./patterns"]
match => { "message" => "%{TIMESTAMP:ts} (.*) %{MESSAGE:msg} %{COLLECTOR:ip}@%{PORT:port}" }
}
if "_grokparsefailure" in [tags] {
drop { }
}
}

output {
elasticsearch {
hosts => ["http://x.y.z.c"]
index => "myindex"
}
}

I am testing just inserting a line in the testfile.txt and I get 2 indexed documents in the elastic

Are the two documents the same?

No. Last line and new Inserted line

I cannot think of a way to explain that. You will need to enable TRACE level logging for the filewatch components. Take a look at this post for instructions.

Ok. I will work on this on Thursday.
Tomorrow I am not at work.
I will let you know
Thank you
Luca

Can you reproduce this always or was this a one off?

Yes I can reproduce this

Hmmm. I wondered for a minute that the sincedb was positioned just before the last line - but that would be a one-off thing.

I will try to recreate tomorrow.

I can recreate this but only under conditions where this behaviour is correct.

Config:

input {
  file {
    path => ["/elastic/tmp/testing/confs/test-file-read-repeat.log"]
    sincedb_path => "/elastic/tmp/testing/confs/test-file-tail-end.sdb"
    start_position => "end"
  }
}
filter {

}
output {
  stdout { codec => rubydebug }
}
  1. Create a file with a single line, say "apples".
  2. Start Logstash
  3. No events are created.
  4. Add a new line, say "strawberries"
  5. An event with "strawberries" as the message field is emitted.
  6. Shutdown Logstash
  7. The sincedb file is written with the position set to 20 (length of apples + LF + length of strawberries + LF).
  8. Remove the last line ("strawberries") from the file.
  9. Start Logstash
  10. An event with "apples" is emitted.
  11. Add a new line, say "strawberries"
  12. An event with "strawberries" as the message field is emitted.

Please confirm that this is the recreation procedure you followed.

Tomorrow I can explain my procedure but it is not like this. Today I am not at work. I only add a line in a file but I get two documents indexed last line and new line. I don't remove any line from the file.
I don't restart logstash.
Regards
Luca

Hello.
I understood what happens.
I have a file that is watched and is /test.txt
Inside this file I have stuff and last line is document20.
Logstash is closed.

  1. I start logstash and nothing gets indexed. (correct)
  2. I open the file /test.txt and add a line document21
  3. I get indexed document 21 and document20 (not correct)
  4. I put a new line document 22 with the command "echo document22 >> /test.txt"
  5. I get indexed only document22 (correct)

The problem is the way I put the entry in a file, opening it or using the echo command.
In the fist way I think I create problem with the sincedb_txt file.

Sorry for the wrong procedure I used

Luca

Is this with an empty or missing sincedb file?

This is the sincedb file.
98307 0 64768 108 1555574697.254953 /test.txt
98309 0 64768 86 1555574632.993001 /test.txt

108 is the number length /test.txt is in bytes

I start logstash and get indexed nothing

IIUC you are deleting /test.txt and recreating it - there are two inodes (records) in the sincedb file data.

is this because I entered the file in vi ?

Probably. I don't know for sure. vi might not edit the original (i do recall from years back now this problem).

Any way if I put an entry in the file with echo command I get everything work perfectly.
I am sorry I didnot expect vi could damage the sincedb

Best regards

Luca Razzi

No worries. The sincedb is not really damaged, it is reporting the truth exactly.

I go into a lot of detail about the sincedb and how file read positions are tracked in this Discuss thread. Read my posts if you want a better understanding of a complex topic that seems simple on the surface.

Good Luck.