DataLoss in Logstash!

sreeram · August 24, 2015, 6:04pm

Hi,

I'm using ELK for Centralized logging and i'm facing DataLoss while processing 5lakh logs(kibana hits),

(Logstash (Shipper&Indexer same instance)) Machine 1 -> (ElasticSearch -> Kibana) Machine 2

Scenario for DataLoss

Logstash started reading log files with 5lakh logs and i'm able to see kibana hits increasing.
While reading, ElasticSearch goes unavailable due to network issue between Machine 1 & 2.
I have configured Logstash output, to retry for 10mins (retry count 120 times & interval 5secs).

Why am I facing data loss in this scenario?
In SinceDB file, What will be the offset position ? (position of logs read successfully / position of logs reached elastic search successfully)
How to handle this scenario(ElasticSearch not available) without data loss ?

magnusbaeck · August 24, 2015, 6:23pm

How do you know that data has been lost? Delayed, sure, but permanently lost? Please explain how you reached that conclusion.

There shouldn't be any data loss. When any output stalls the whole Logstash pipeline stalls and Logstash will stop reading from the files.
It's the number of bytes read and passed into the pipeline. The pipeline only has a 20 (or is it 20+20?) events in its buffer so you should never lose more than that.

sreeram · August 24, 2015, 6:40pm

Scenario 1 (No network issue)

No. of logs in log files = kibana hits = 500,000

Scenario 2 (network issue)

No. of logs in log files = 500,000
kibana hits = 450,000 (varies each time)

Logstash Config for your reference

input {

file {
path => ["D:/logpath/**/*.txt"]
codec => plain { charset => "UTF-16" }
start_position => "beginning"
sincedb_path => ["D:/since.db"]
}
}
filter {

multiline {
# Grok pattern names are valid!
pattern => "\d\t(?!$)"
negate => "true"
what => "previous"
}

mutate{
gsub => [message, "\n", "!n!"]
gsub => [message, """, "!dq!"]
gsub => [message, "'", "!sq!"]
}

 csv {
      columns => ["modulename", "threadid", "datedon","logtype","logdescription"]
      separator => "	"
    }

date{
locale => "en"
timezone => "UTC"
match => [ "datedon", "dd-MM-yyyy HH:mm:ss Z","dd-MMM-yyyy HH:mm:ss Z", "dd/MM/yyyy h:mm:ss a Z","dd/MM/yyyy hh:mm:ss a Z","MM/dd/yyyy hh:mm:ss a Z","M/dd/yyyy hh:mm:ss a Z","MM/dd/yyyy h:mm:ss a Z"]

}
mutate {
remove_field => [ "column6" ]
remove_field => [ "datedon" ]
}

mutate{
convert => { "threadid" => "integer" }
gsub => [message, "!n!", "
"]
gsub => [logdescription, "!n!", "
"]
gsub => [message, "!dq!", '"']
gsub => [logdescription, "!dq!", '"']
gsub => [message, "!sq!", "'"]
gsub => [logdescription, "!sq!", "'"]
}
}

output {

elasticsearch {
host => "10.2.44.124"
protocol => http
workers => 3
flush_size => 50000
max_retries => 100

}

magnusbaeck · August 24, 2015, 6:45pm

Okay. I've seen cases at least with Logstash 1.4.2 where it gets upset when ES is unavailable and you have to restart it to get it going again—have you tried that? Also, what's in the sincedb file? Does Logstash think it has read everything that's in the input files, or is there unread data that it for some reason isn't trying to ship to ES?

sreeram · August 24, 2015, 7:00pm

Will check whether restart is working (but in production I can't restart each time when n/w issues occurs).
-- How to handle then?
Will check SinceDb offset & post it here.
-- Can you please explain, will logstash moves .sincedb offset(pointer) immediately after it has read a log?

My environment details FYI,
OS => Win 7 64 bit
Logstash 1.5.2
ElasticSearch 1.6.2

magnusbaeck · August 24, 2015, 8:12pm

Will check whether restart is working (but in production I can't restart each time when n/w issues occurs).
-- How to handle then?

Let's understand the nature of the problem first.

Can you please explain, will logstash moves .sincedb offset(pointer) immediately after it has read a log?

That's controlled by the sincedb_write_interval configuration parameter.

sreeram · August 25, 2015, 5:33pm

Data loss resolved!!!
previously in logstash configuration output plugin,
flush_size = 50,000
retry_item_count = 5000 (default)
when I changed to,
flush_size = 5000 (default)
retry_item_count = 5000 (default)

Data loss issue on network failure got resolved

sreeram · August 25, 2015, 5:55pm

While testing I observed the following,

java.exe is the process which holds flush message and offset(no. of lines read)
During network failure, if I kill "java.exe" & restart logstash service, data is getting duplicated.
why this happening?!
can I reduced sincedb_write_interval from 15secs(default) to 5 secs?!
my production environment has very slow network speed (less than 256kbps), Is it possible to compress the data in flush before pushing it to elastic search?!

it will be very helpful for me to understand how logstash works if I get clarified.

magnusbaeck · August 25, 2015, 6:20pm

During network failure, if I kill "java.exe" & restart logstash service, data is getting duplicated.
why this happening?!

Because killing a Windows process doesn't allow it to shut down in an orderly fashion and do stuff like flush the sincedb. Assuming you by "kill" mean use of End Process in Task Manager or something equivalent that eventually ends up with a TerminateProcess() Win32 call.

can I reduced sincedb_write_interval from 15secs(default) to 5 secs?!

Yes, certainly.

my production environment has very slow network speed (less than 256kbps), Is it possible to compress the data in flush before pushing it to Elasticsearch?!

I don't think that's possible out of box. You'd probably have to build some kind of proxy or transparent middle-man that does this. Or you could rearchitect your setup and e.g. ship logs in compressed form to the same network location as ES and do the Logstash processing there.

sreeram · August 26, 2015, 6:57am

Yes I can re-architect but I'm more interested in the idea of building a proxy or transparent middle-man for compression. But have no idea about it, can you please explain how to implement such set-up?

magnusbaeck · August 26, 2015, 7:11am

I was thinking about something like Ziproxy. I don't have any particular experiences to share.

zhaochl · April 11, 2017, 3:13am

I have the same problem ,logstash miss data, can you give me any advise

Topic		Replies	Views
Logstash is losing messages Logstash	4	1972	July 6, 2017
Logstash data Loss Logstash	4	1854	July 6, 2017
Data loss in Elasticstack Elasticsearch	7	483	July 26, 2020
ELK Server Failed Logstash	9	362	September 3, 2018
Data missed in Logstash! Logstash	5	1623	May 9, 2017

DataLoss in Logstash!

Related topics