cbrown184
(Christopher Brown)
April 13, 2021, 6:00pm
1
Hi - we use Filebeat to output our logs to Kafka. We got hit by a nasty bug in prod where Filebeat gets stuck in an endless loop.
It was previously documented on this thread Filebeat kafka output hash.hash get negative partition cause stuck
I have proposed a low-impact fix here.
## What does this PR do?
Fix a bug in Kafka output partition to hash method which results in a negative Kafka partition.
Demo of the bug available on my other branch. https://github.com/cbrown184/beats/tree/negative_partition_edge_case_demo_test
You can reproduce this with any key that gives the hash value of 2147483648. E.g the string "16000002EFEBA11E"
## Why is it important?
When the partition key hash is equal to 2147483648 Filebeat gets stuck in an endless loop.
## Checklist
<!-- Mandatory
Add a checklist of things that are required to be reviewed in order to have the PR approved
List here all the items you have verified BEFORE sending this PR. Please DO NOT remove any item, striking through those that do not apply. (Just in case, strikethrough uses two tildes. ~~Scratch this.~~)
-->
- [x] My code follows the style guidelines of this project
- [ ] ~~I have commented my code, particularly in hard-to-understand areas~~
- [ ] ~~I have made corresponding changes to the documentation~~
- [ ] ~~I have made corresponding change to the default configuration files~~
- [x] I have added tests that prove my fix is effective or that my feature works
- [x] I have added an entry in `CHANGELOG.next.asciidoc` or `CHANGELOG-developer.next.asciidoc`.
## Author's Checklist
<!-- Recommended
Add a checklist of things that are required to be reviewed in order to have the PR approved
-->
- [ ]
## How to test this PR locally
## Related issues
This bug was discussed previously in this thread.
https://discuss.elastic.co/t/filebeat-kafka-output-hash-hash-get-negative-partition-cause-stuck/265153
## Use cases
hash(16000002EFEBA11E) = 2147483648
Int32(2147483648) = -2147483648
-(-2147483648) = -2147483648
-2147483648 % 3 = -2
Kafka doesn't have -2 partition, STUCK.
Kafka doesn't have -2 partition, so the kafka producer becomes stuck forever.
## Screenshots
## Logs
Filebeat endless loop
2021-02-23T13:18:40.911+0800 DEBUG [kafka] kafka/client.go:169 got event.Meta["partition"] = -2
2021-02-23T13:18:40.911+0800 DEBUG [kafka] kafka/client.go:179 got event.Meta["topic"] = hashtest
2021-02-23T13:18:40.911+0800 DEBUG [kafka] kafka/client.go:273 finished kafka batch
2021-02-23T13:18:40.911+0800 DEBUG [kafka] kafka/client.go:287 Kafka publish failed with: kafka: partitioner returned an invalid partition index
2021-02-23T13:18:40.911+0800 DEBUG [kafka] kafka/client.go:169 got event.Me
Error Logs in production
2021-04-09T13:48:02Z DBG Kafka publish failed with: kafka: partitioner returned an invalid partition index
Demo test failing
=== RUN TestHashMaxIntPlusOneDoesNotReturnNegativePartition
partition_test.go:328:
Error Trace: partition_test.go:328
Error: Should be true
Test: TestHashMaxIntPlusOneDoesNotReturnNegativePartition
--- FAIL: TestHashMaxIntPlusOneDoesNotReturnNegativePartition (0.00s)
FAIL
Expected :8
Actual :-8
elastic:master
← cbrown184:negative_partition_edge_case_patch
opened 05:06PM - 13 Apr 21 UTC
Hey @cbrown184 ! Thanks for letting us know, I have added the correct labels to the PR so that the correct team will get a notification, will let you know how it goes in the PR itself most likely!
1 Like
system
(system)
Closed
May 13, 2021, 12:45pm
4
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.