Hello,
I have a Python script that fetches data and writes it to a log file named cb.json. The script runs every 5 minutes and retrieves the last 1,000 log entries. As a result, some log entries may be fetched multiple times
since it last 1,000 logs.
Filebeat is configured to collect these logs and send them to Elasticsearch. However, I'm encountering an issue where duplicate logs are appearing in my Elasticsearch index.
I'm seeking advice on how to prevent these duplicates from being indexed in Elasticsearch. Are there any recommended approaches or best practices to ensure that only unique log entries are stored?
I have tested with id, hashes, document, fingerprints and you name it, without success.
Thank you in advance for your assistance.
#:/etc/filebeat$ cat filebeat.yml
setup.template.name: "carbon_black"
setup.template.pattern: "carbon_black-*"
setup.template.enable: false
setup.ilm.enabled: false
filebeat.inputs:
- type: log
enabled: true
paths:- /home/arc/cb.json
json.keys_under_root: true
json.overwrite_keys: true
processors: - add_fields:
target: event
fields:
dataset: "carbon_black.observations"
- /home/arc/cb.json
output.elasticsearch:
hosts: ["https://10.10.0.1:9200", "https://10.10.0.2:9200", "https://10.10.0.3:9200"]
protocol: "https"
ssl.verification_mode: "none"
username: "elastic"
password: "elastic"
output.elasticsearch.indices:
- index: "carbon_black_observations-%{+yyyy.MM.dd}"
when.contains:
event.dataset: "carbon_black.observations"