- Logstash version: any version with PQs enabled - currently tested on 5.5 upto 5.6.1
- Platform: any
- Simple test case: https://github.com/rdsubhas/logstash-queue-corruption
logstash.yml:
http.host: "0.0.0.0"
path.config: /usr/share/logstash/pipeline
xpack.monitoring.enabled: false
queue.type: persisted
pipeline.conf
input {
http { }
}
output {
stdout { codec => rubydebug }
}
Send any event with a large int value:
curl -XPOST -H 'content-type:application/json'` \
-d '{ "some_value": 9223372036854776000 }' \
http://127.0.0.1:8080
And that's all. Its full data loss from this point onwards. Logstash will queue everything and the queue is irrecoverable.
So any event, containing any field at any nested level, from any input (http, beats, log4j2, anything) that has one overflowing int or decimal, will destroy all your logs going forward when PQs are enabled.
List of resources:
- Reproduce: https://github.com/rdsubhas/logstash-queue-corruption
- Logstash 5.6.0 deserialization errors when persistent queues enabled
- Logstash Crash with Persistent Queue and Kafka Input
- Logstash 5.4 - Repeated deserialization errors and losing data with Persistent Queues
- PR that might be a fix, scheduled for Logstash v6.1: https://github.com/elastic/logstash/issues/8131
Conclusion: Logstash Persistent Queues should not be marked production ready
Please, given that any small event with a single field can cause full data loss. It's not really a small or isolated thing, the conditions are huge, varied and diverse. The magnitude of this doesn't seem to be trivial.