CentOS release 6.9
logstash 5.3.3-1
java version "1.8.0_92"
Kafka - 2.11-0.11.0.0
I have a few hosts with logs. from these hosts filebeat logs to Kafka
cat /etc/filebeat/filebeat.yml
filebeat:
prospectors:
# container logs
-
paths:
- "/mnt/log/log.log"
json.keys_under_root: true
document_type: log
fields:
region: hostname
output.kafka:
hosts: ["kafka-host1:9092"]
topic: "some-log"
max_message_bytes: 1000000
Kafka's configuration is close to the default
In one Kafka logs write 2-3 servers.
Kafka's servers not integrated into the cluster
The data stream is not large - 1-2 Mbit /s
Logstash reads these data from the kafka and transfers it to the elastic.
Logstash reads from 4 server kafka
kafka {
codec => json
bootstrap_servers => "kafka-server-1:9092"
client_id => "kafka1"
topics => ["some-log"]
auto_offset_reset => "latest"
}
kafka {
codec => json
bootstrap_servers => "kafka-server-2:9092"
client_id => "kafka2"
topics => ["some-log"]
auto_offset_reset => "latest"
}
....
But periodically there is a multiplicity duplication of records in elastic
If I do
kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic some-log --from-beginning
there is a record in a single copy
As a result of which this happens - I can not understand and where to dig, too.
As a result, it happens - I cannot understand.