Logstash Persistent Queues throws exception when trying to read BigInteger values from the queue

subhasdan · September 23, 2017, 1:21pm

Logstash version: any version with PQs enabled - currently tested on 5.5 upto 5.6.1
Platform: any
Simple test case: https://github.com/rdsubhas/logstash-queue-corruption

logstash.yml:

http.host: "0.0.0.0"
path.config: /usr/share/logstash/pipeline
xpack.monitoring.enabled: false
queue.type: persisted

pipeline.conf

input {
  http { }
}
output {
  stdout { codec => rubydebug }
}

Send any event with a large int value:

curl -XPOST -H 'content-type:application/json'` \
  -d '{ "some_value": 9223372036854776000 }' \
  http://127.0.0.1:8080

And that's all. Its full data loss from this point onwards. Logstash will queue everything and the queue is irrecoverable.

So any event, containing any field at any nested level, from any input (http, beats, log4j2, anything) that has one overflowing int or decimal, will destroy all your logs going forward when PQs are enabled.

List of resources:

Reproduce: https://github.com/rdsubhas/logstash-queue-corruption
Logstash 5.6.0 deserialization errors when persistent queues enabled
Logstash Crash with Persistent Queue and Kafka Input
Logstash 5.4 - Repeated deserialization errors and losing data with Persistent Queues
PR that might be a fix, scheduled for Logstash v6.1: https://github.com/elastic/logstash/issues/8131

Conclusion: Logstash Persistent Queues should not be marked production ready

Please, given that any small event with a single field can cause full data loss. It's not really a small or isolated thing, the conditions are huge, varied and diverse. The magnitude of this doesn't seem to be trivial.

cc @guyboertje @Zt_Zeng @Myles

subhasdan · September 23, 2017, 1:23pm

We went with Logstash PQs as its marked ready for production use - in architecture diagrams, blog posts and everywhere, and we trust all those announcements. But we have experienced lots of random plugin-independent general data loss, and there is no easy solution to this, atleast until Logstash v6.1

warkolm · September 23, 2017, 10:07pm

Thanks for raising this, did you also create a github issue?

Andrew_Cholakian1 · September 24, 2017, 1:13am

For those following this thread, we're replying on the github issue: https://github.com/elastic/logstash/issues/8379

We take data-loss bugs seriously, and will plan out a course of action in the GH issue.

colinsurprenant · September 25, 2017, 10:27pm

Followup: the problem as reported here is caused by a deserialization issue with the version of the Jackson library we are using which does not correctly deserializes Bignum/BigInteger numbers.

What is happening is, with the { "some_value": 9223372036854776000 } json message, the value 9223372036854776000 is serialized as a BigInteger using CBOR encoding in the persisted queue, which is correct, but with the Jackson library version 2.7 that we are currently using, there is a problem with the BigInteger deserialization, after the Event is dequeued. The data in the queue is not corrupted but logstash is unable to decode it.

We confirmed in https://github.com/elastic/logstash/issues/8379 that upgrading Jackson solves this problem. We are currently working to update logstash 5.6 and up with an updated Jackson version.

jordansissel · September 25, 2017, 10:51pm

To follow up on @colinsurprenant's comment:

There is no data corruption
When a fix is released, upgrading logstash will allow this data to be read.

colinsurprenant · September 27, 2017, 7:32pm

As discussed in https://github.com/elastic/logstash/issues/8379 this is now fixed and will be part of the 5.6.3 release.

system · October 25, 2017, 7:33pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash 5.4 - Repeated deserialization errors and losing data with Persistent Queues Logstash	12	2675	September 25, 2017
No enum constant org.logstash.bivalues.BiValues Logstash	1	586	September 22, 2017
Logstash 5.6.0 deserialization errors when persistent queues enabled Logstash	5	988	October 21, 2017
Logstash Crash with Persistent Queue and Kafka Input Logstash	6	2174	August 4, 2017
Upgrading to 6.3+ with Persistent Queues Enabled Logstash	2	750	July 16, 2018

Logstash Persistent Queues throws exception when trying to read BigInteger values from the queue

Conclusion: Logstash Persistent Queues should not be marked production ready

Related topics