I am exporting from one ELK stack using logstash. I am writing the output to a json gz file.
Then I am importing to another ELK stack also using logstash. There is no connection between the two environments.
This export/import can occur multiple times throughout the day.
I have noticed that when the the target index exists in 2nd environment, it seems to just append documents onto the end of the index. So, I might get 200k documents imported instead of maybe 5 million documents.
This is my import pipeline for the metricbeat index:
input {
file {
path => "/usr/share/logstash/export/export_metricbeat-7.17.7-2023.04.28-000007.json"
start_position => "beginning"
codec => "json"
mode => "read"
exit_after_read => true
}
}
output {
elasticsearch {
hosts => "http://localhost:9200"
index => "metricbeat-7.17.7-2023.04.28-000007"
ssl => "false"
}
}
If I had already done an import on the 28th of April, then this index will already exist in the target environment.
Is there a way in logstash to always import no matter what exists in Elastic? I thought it might be duplicate document ids, but I don't see how that's possible. The document ids should be unique.