When the logstash instance is closed (exit from terminal by Ctrl+C) and is restarted again,the logs gets duplicated in the elasticsearch

What to add in .conf file to avoid duplicate entry problem?

You need to provide context on your issue, there is nothing in your post about your configuration, so it is impossible to know what is the issue.

Please share your configuration.

My config file is:

input {
file {
path => "/home/bhaveshk/record.csv"
start_position => "beginning"
sincedb_path => "/dev/null"

filter {
csv {
columns => [
separator => ","


mutate {convert => ["salary", "float"]}
mutate {convert => ["age", "float"]}
mutate {convert => ["antname", "string"]}

mutate {remove_field => ["message"]}


output {
elasticsearch {
hosts => "localhost:9200"
index => "recordv1"
stdout { codec => rubydebug }

For single record in csv file, 4 duplicate entry found with diffrent _id in elasticsearch after restart the logstash conf file.
image of elasticsearch duplicate entry is attached.

The file input has an in-memory sincedb which it uses to keep track of what parts of which files it has read. This is persisted across restarts. If you set sincedb_path => "/dev/null" then it is written to /dev/null when logstash shuts down, which means the information is discarded, and when logstash restarts it is read from /dev/null, which means the sincedb starts off empty. Thus the file input does not know it has already processed record.csv.

If you remove the sincedb_path => "/dev/null" then logstash will persist the sincedb to disk across restarts and will not process the part of the file that it has already read.

If i remove sincedb_path => "/dev/null" line from the above config file then logstash started successfully but not able to send new data..

Terminal message: [- No sincedb_path set,]

[INFO ] 2022-07-25 16:56:57.844 [[main]-pipeline-manager] javapipeline - Pipeline Java execution initialization time {"seconds"=>2.19}
[INFO ] 2022-07-25 16:56:57.879 [[main]-pipeline-manager] file - No sincedb_path set, generating one based on the "path" setting {:sincedb_path=>"/usr/share/logstash/data/plugins/inputs/file/.sincedb_db000e21c2514b893f285c97ba0c970c", :path=>["/home/bhaveshk/record.csv"]}
[INFO ] 2022-07-25 16:56:57.894 [[main]-pipeline-manager] javapipeline - Pipeline started {"pipeline.id"=>"main"}
[INFO ] 2022-07-25 16:56:57.912 [[main]<file] observingtail - START, creating Discoverer, Watch with file and sincedb collections
[INFO ] 2022-07-25 16:56:57.915 [Agent thread] agent - Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.