Logstash parsing same contents again and again

xsallowed · March 31, 2018, 7:52am

Hi All,
I am having a weird problem. Logstash is parsing and indexing same contents of a file over and again after a certain time period. Sincedb_path is note created. and if i let logstash run for days it the number of entries keep multiplying.

Please help what could be the problem

Regards
Tahir

atira · March 31, 2018, 8:54am

Do I understand it correctly: are you using logstash to read the file?

I currently use filebeat only and avoid duplicate entries by creating a sha256 hash (fingerprint filter) from the original message and use it as the unique document_id.

Christian_Dahlqvist · March 31, 2018, 9:03am

What does your config look like?

xsallowed · March 31, 2018, 10:25am

Here is my config file ->

input {
file {
path => "/home/xsalllowed/Desktop/xxxxx/data/a/y.part"
start_position => "beginning"
sincedb_path => "/var/null"
max_open_files => 400
sincedb_write_interval => 2
}
}
filter {
csv {
separator => ":"
columns => ["email","password"]
}
}
output {
elasticsearch {
hosts => "http://localhost:9200"
index => "breachcompilation"
}
stdout {}
}

magnusbaeck · March 31, 2018, 11:14am

sincedb_path => "/var/null"

What's this supposed to mean? Set it to a reasonable path that writable or leave it unset.

xsallowed · March 31, 2018, 6:17pm

Doesn't matter which value I keep for sincedb_path or leave it unset, I end up getting same data after 7 minutes.

xsallowed · March 31, 2018, 7:24pm

I just saw something more in my logs. There is a new error when i tried to give large files for checking. Any specific reason for this ERROR:

[2018-03-31T22:51:30,044][INFO ][logstash.agent ] Pipelines running {:count=>1, :pipelines=>["main"]}
[2018-03-31T22:51:36,533][ERROR][org.logstash.Logstash ] java.lang.OutOfMemoryError: Java heap space
[2018-03-31T22:51:51,666][INFO ][logstash.modules.scaffold] Initializing module {:module_name=>"fb_apache", :directory=>"/usr/share/logstash/modules/fb_apache/configuration"}
[2018-03-31T22:51:51,674][INFO ][logstash.modules.scaffold] Initializing module {:module_name=>"netflow", :directory=>"/usr/share/logstash/modules/netflow/configuration"}
[2018-03-31T22:51:51,936][WARN ][logstash.config.source.multilocal] Ignoring the 'pipelines.yml' file because modules or command line options are specified
[2018-03-31T22:51:52,063][INFO ][logstash.runner ] Starting Logstash {"logstash.version"=>"6.2.3"}
[2018-03-31T22:51:52,165][INFO ][logstash.agent ] Successfully started Logstash API endpoint {:port=>9600}
[2018-03-31T22:51:52,825][INFO ][logstash.pipeline ] Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>4, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50}
[2018-03-31T22:51:53,088][INFO ][logstash.outputs.elasticsearch] Elasticsearch pool URLs updated {:changes=>{:removed=>[], :added=>[http://localhost:9200/]}}
[2018-03-31T22:51:53,090][INFO ][logstash.outputs.elasticsearch] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>http://localhost:9200/, :path=>"/"}
[2018-03-31T22:51:53,196][WARN ][logstash.outputs.elasticsearch] Restored connection to ES instance {:url=>"http://localhost:9200/"}
[2018-03-31T22:51:53,241][INFO ][logstash.outputs.elasticsearch] ES Output version determined {:es_version=>6}
[2018-03-31T22:51:53,241][WARN ][logstash.outputs.elasticsearch] Detected a 6.x and above cluster: the type event field won't be used to determine the document _type {:es_version=>6}
[2018-03-31T22:51:53,243][INFO ][logstash.outputs.elasticsearch] Using mapping template from {:path=>nil}
[2018-03-31T22:51:53,247][INFO ][logstash.outputs.elasticsearch] Attempting to install template {:manage_template=>{"template"=>"logstash-", "version"=>60001, "settings"=>{"index.refresh_interval"=>"5s"}, "mappings"=>{"default"=>{"dynamic_templates"=>[{"message_field"=>{"path_match"=>"message", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false}}}, {"string_fields"=>{"match"=>"", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false, "fields"=>{"keyword"=>{"type"=>"keyword", "ignore_above"=>256}}}}}], "properties"=>{"@timestamp"=>{"type"=>"date"}, "@version"=>{"type"=>"keyword"}, "geoip"=>{"dynamic"=>true, "properties"=>{"ip"=>{"type"=>"ip"}, "location"=>{"type"=>"geo_point"}, "latitude"=>{"type"=>"half_float"}, "longitude"=>{"type"=>"half_float"}}}}}}}}
[2018-03-31T22:51:53,255][INFO ][logstash.outputs.elasticsearch] New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>["http://localhost:9200"]}
[2018-03-31T22:51:53,447][INFO ][logstash.pipeline ] Pipeline started succesfully {:pipeline_id=>"main", :thread=>"#<Thread:0x16ccb605@/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:246 sleep>"}
[2018-03-31T22:51:53,466][INFO ][logstash.agent ] Pipelines running {:count=>1, :pipelines=>["main"]}
[2018-03-31T22:52:00,334][ERROR][org.logstash.Logstash ] java.lang.OutOfMemoryError: Java heap space
[2018-03-31T22:52:18,770][INFO ][logstash.modules.scaffold] Initializing module {:module_name=>"fb_apache", :directory=>"/usr/share/logstash/modules/fb_apache/configuration"}
[2018-03-31T22:52:18,777][INFO ][logstash.modules.scaffold] Initializing module {:module_name=>"netflow", :directory=>"/usr/share/logstash/modules/netflow/configuration"}
[2018-03-31T22:52:19,173][WARN ][logstash.config.source.multilocal] Ignoring the 'pipelines.yml' file because modules or command line options are specified
[2018-03-31T22:52:19,288][INFO ][logstash.runner ] Starting Logstash {"logstash.version"=>"6.2.3"}
[2018-03-31T22:52:19,375][INFO ][logstash.agent ] Successfully started Logstash API endpoint {:port=>9600}
[2018-03-31T22:52:20,076][INFO ][logstash.pipeline ] Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>4, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50}
[2018-03-31T22:52:20,291][INFO ][logstash.outputs.elasticsearch] Elasticsearch pool URLs updated {:changes=>{:removed=>[], :added=>[http://localhost:9200/]}}
[2018-03-31T22:52:20,294][INFO ][logstash.outputs.elasticsearch] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>http://localhost:9200/, :path=>"/"}
[2018-03-31T22:52:20,400][WARN ][logstash.outputs.elasticsearch] Restored connection to ES instance {:url=>"http://localhost:9200/"}
[2018-03-31T22:52:20,445][INFO ][logstash.outputs.elasticsearch] ES Output version determined {:es_version=>6}
[2018-03-31T22:52:20,447][WARN ][logstash.outputs.elasticsearch] Detected a 6.x and above cluster: the type event field won't be used to determine the document _type {:es_version=>6}
[2018-03-31T22:52:20,454][INFO ][logstash.outputs.elasticsearch] Using mapping template from {:path=>nil}
[2018-03-31T22:52:20,458][INFO ][logstash.outputs.elasticsearch] Attempting to install template {:manage_template=>{"template"=>"logstash-", "version"=>60001, "settings"=>{"index.refresh_interval"=>"5s"}, "mappings"=>{"default"=>{"dynamic_templates"=>[{"message_field"=>{"path_match"=>"message", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false}}}, {"string_fields"=>{"match"=>"", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false, "fields"=>{"keyword"=>{"type"=>"keyword", "ignore_above"=>256}}}}}], "properties"=>{"@timestamp"=>{"type"=>"date"}, "@version"=>{"type"=>"keyword"}, "geoip"=>{"dynamic"=>true, "properties"=>{"ip"=>{"type"=>"ip"}, "location"=>{"type"=>"geo_point"}, "latitude"=>{"type"=>"half_float"}, "longitude"=>{"type"=>"half_float"}}}}}}}}
[2018-03-31T22:52:20,472][INFO ][logstash.outputs.elasticsearch] New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>["http://localhost:9200"]}
[2018-03-31T22:52:20,829][INFO ][logstash.pipeline ] Pipeline started succesfully {:pipeline_id=>"main", :thread=>"#<Thread:0x7cfe2529@/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:246 sleep>"}

atira · March 31, 2018, 8:08pm

What did you name this config file?

The logstash JVM (process) ran out of memory space. It can happen when it's working on large data and doesn't have enough memory to process it. Increase it in <logstash_install_dir>/config/jvm.options file. Eg. -Xmx2g will set it to 2GB. Setting -Xms (minimum memory set) and -Xmx (maximum memory that can be allocated) to the same is recommended by guides.

yaauie · March 31, 2018, 8:14pm

I presume you are being the File input?

The File input yses the sincedb to keep track of what it has read so that it can avoid reprocessing the same messages over and over again.

What does your configuration look like for the file input? Does the user under which your process is running have read and write access to the place where it is trying to have a sincedb?

Are the files Logstash needs to read from on a network volume or other secondary mount? The sincedb uses the actual inode reference for tracking position (not just the path), so remounting the partition or rewriting the files (even if with identical contents) can cause the file paths to point to new inodes, which prevents Logstash from reliably remembering position.

magnusbaeck · March 31, 2018, 8:44pm

My guess is that your JVM is running out of heap before Logstash has had the opportunity to write the sincedb file and record how much it has processed. Fix the JVM problem and the rest will probably be fine.

xsallowed · April 1, 2018, 4:46am

Thanks alot Atira,
Your solution worked but after sometime it started re-indexing the same data over and again. Wonder what could be the issue. Since I have bot set the sincedb_path I cannot check and confirm if it is being written or not?

system · April 29, 2018, 4:46am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash parses log files again: is that recurrent? Logstash	5	420	October 23, 2018
[sincedb creation] Logstash	33	4168	July 6, 2017
Logstash 6.2.1 Big .since-db file causes OutOfMemory Logstash	22	2883	June 6, 2018
Logstash sincedb files Logstash	7	10630	August 31, 2017
Updating data doubles and then triples the hit count on kibana Logstash	5	367	August 6, 2019

Logstash parsing same contents again and again

Related topics