Logstash Persistent Queues throws exception when trying to read BigInteger values from the queue

logstash.yml:

http.host: "0.0.0.0"
path.config: /usr/share/logstash/pipeline
xpack.monitoring.enabled: false
queue.type: persisted 

pipeline.conf

input {
  http { }
}
output {
  stdout { codec => rubydebug }
}

Send any event with a large int value:

curl -XPOST -H 'content-type:application/json'` \
  -d '{ "some_value": 9223372036854776000 }' \
  http://127.0.0.1:8080

And that's all. Its full data loss from this point onwards. Logstash will queue everything and the queue is irrecoverable.

So any event, containing any field at any nested level, from any input (http, beats, log4j2, anything) that has one overflowing int or decimal, will destroy all your logs going forward when PQs are enabled.

List of resources:

Conclusion: Logstash Persistent Queues should not be marked production ready

Please, given that any small event with a single field can cause full data loss. It's not really a small or isolated thing, the conditions are huge, varied and diverse. The magnitude of this doesn't seem to be trivial.

cc @guyboertje @Zt_Zeng @Myles

We went with Logstash PQs as its marked ready for production use - in architecture diagrams, blog posts and everywhere, and we trust all those announcements. But we have experienced lots of random plugin-independent general data loss, and there is no easy solution to this, atleast until Logstash v6.1

Thanks for raising this, did you also create a github issue?

For those following this thread, we're replying on the github issue: https://github.com/elastic/logstash/issues/8379

We take data-loss bugs seriously, and will plan out a course of action in the GH issue.

3 Likes

Followup: the problem as reported here is caused by a deserialization issue with the version of the Jackson library we are using which does not correctly deserializes Bignum/BigInteger numbers.

What is happening is, with the { "some_value": 9223372036854776000 } json message, the value 9223372036854776000 is serialized as a BigInteger using CBOR encoding in the persisted queue, which is correct, but with the Jackson library version 2.7 that we are currently using, there is a problem with the BigInteger deserialization, after the Event is dequeued. The data in the queue is not corrupted but logstash is unable to decode it.

We confirmed in https://github.com/elastic/logstash/issues/8379 that upgrading Jackson solves this problem. We are currently working to update logstash 5.6 and up with an updated Jackson version.

1 Like

To follow up on @colinsurprenant's comment:

  • There is no data corruption
  • When a fix is released, upgrading logstash will allow this data to be read.
3 Likes

As discussed in https://github.com/elastic/logstash/issues/8379 this is now fixed and will be part of the 5.6.3 release.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.