I would like to achieve when I write data through the logstash to elasticsearch when the first to elasticsearch Lane investigation or any other way to achieve the removal of duplication of function?
Parts of your question are impossible to understand, but it's clear that you want to eliminate duplication. Please show your configuration files along with some sample input so that we can understand why the duplication occurs in the first place.
thank you reply!
I was in the import of a sql file, sql file contains a lot of duplicate data。
i use filebeat to input data logstash,and use logstash filter data,output elasearch.
A "sql file", you mean a file with results from a SQL query in e.g. CSV format? Okay. The key is to define a document id from the data instead of letting ES pick its own document id. If your data already has a primary key, use that field as the document id (i.e. set the elasticsearch output's document_id
option to "%{name_of_primary_key_field}"
). You can also use the fingerprint filter to generate a stable hash based on all or a set of the fields.