I'm reading data with filebeat, sending to logstash and it ends up in ES. One issue we have is sometimes Filebeat re-reads files leading to duplicate entries in Elastic Search.
One idea was to create a custom _id value in logstash based on the filename + offset. I would assume that the filename rarely is greater then 32 characters, the files we process are generally 500MB or less, so we are looking at up to another 9 digits. So in most cases the id_value would be 30-40 characters. (Perhaps we could hash this to make it smaller)
Anyone know of any performance issues, or done anything similar?