Logstash Persistent Queues throws exception when trying to read BigInteger values from the queue


(subhasdan) #1

logstash.yml:

http.host: "0.0.0.0"
path.config: /usr/share/logstash/pipeline
xpack.monitoring.enabled: false
queue.type: persisted 

pipeline.conf

input {
  http { }
}
output {
  stdout { codec => rubydebug }
}

Send any event with a large int value:

curl -XPOST -H 'content-type:application/json'` \
  -d '{ "some_value": 9223372036854776000 }' \
  http://127.0.0.1:8080

And that's all. Its full data loss from this point onwards. Logstash will queue everything and the queue is irrecoverable.

So any event, containing any field at any nested level, from any input (http, beats, log4j2, anything) that has one overflowing int or decimal, will destroy all your logs going forward when PQs are enabled.

List of resources:

Conclusion: Logstash Persistent Queues should not be marked production ready

Please, given that any small event with a single field can cause full data loss. It's not really a small or isolated thing, the conditions are huge, varied and diverse. The magnitude of this doesn't seem to be trivial.

cc @guyboertje @Zt_Zeng @Myles


(subhasdan) #2

We went with Logstash PQs as its marked ready for production use - in architecture diagrams, blog posts and everywhere, and we trust all those announcements. But we have experienced lots of random plugin-independent general data loss, and there is no easy solution to this, atleast until Logstash v6.1


(Mark Walkom) #3

Thanks for raising this, did you also create a github issue?


(Andrew Cholakian) #4

For those following this thread, we're replying on the github issue: https://github.com/elastic/logstash/issues/8379

We take data-loss bugs seriously, and will plan out a course of action in the GH issue.


(Colin Surprenant) #5

Followup: the problem as reported here is caused by a deserialization issue with the version of the Jackson library we are using which does not correctly deserializes Bignum/BigInteger numbers.

What is happening is, with the { "some_value": 9223372036854776000 } json message, the value 9223372036854776000 is serialized as a BigInteger using CBOR encoding in the persisted queue, which is correct, but with the Jackson library version 2.7 that we are currently using, there is a problem with the BigInteger deserialization, after the Event is dequeued. The data in the queue is not corrupted but logstash is unable to decode it.

We confirmed in https://github.com/elastic/logstash/issues/8379 that upgrading Jackson solves this problem. We are currently working to update logstash 5.6 and up with an updated Jackson version.


(Jordan Sissel) #6

To follow up on @colinsurprenant's comment:

  • There is no data corruption
  • When a fix is released, upgrading logstash will allow this data to be read.

(Colin Surprenant) #7

As discussed in https://github.com/elastic/logstash/issues/8379 this is now fixed and will be part of the 5.6.3 release.


(system) #8

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.