Logstash Filter creating Duplicate documents

I have below logstash filter which i am using to filter the raw data provided below, i works while tested with debugger but the problem is its creating duplicate documents for the same entry ie for the data sample provided that's appeared 3 times even _grokparsefailure

Logstash Version: 6.8

# cat /etc/logstash/conf.d/rmlog.conf
input {
  file {
    path => [ "/data/rm_logs/*.txt" ]
    start_position => beginning
    max_open_files => 64000
    #sincedb_path => "/data/registry-1"
    sincedb_path =>  "/dev/null"
    type => "rmlog"
  }
}
filter {
  if [type] == "rmlog" {
    grok {
     match => { "message" => "%{HOSTNAME:hostname},%{DATE:date},%{SECOND:time},(%{NUMBER:duration})?-%{WORD:hm},%{USER:user},%{USER} %{NUMBER:pid} %{NUMBER} %{NUMBER} %{NUMBER} %{NUMBER} %{DATA} (?:%{HOUR}:|)(?:%{MINUTE}|) (?:%{HOUR}:|)(?:%{MINUTE}|)%{GREEDYDATA:cmd},%{GREEDYDATA:pwd}" }
      add_field => [ "received_at", "%{@timestamp}" ]
      remove_field => [ "@version", "host", "message", "_type", "_index", "_score" ]
   }
  }
  if "_grokparsefailure" in [tags] {
    grok {
      match => { "message" => "%{HOSTNAME:hostname},%{DATE:date},%{SECOND:time},(%{NUMBER:duration})?-%{WORD:hm},%{USER:user},%{USER} %{GREEDYDATA:cmd},%{GREEDYDATA:pwd}" }
      add_field => [ "received_at", "%{@timestamp}" ]
      remove_field => [ "@version", "host", "message", "_type", "_index", "_score" ]
  }
 }
}
output {
        if [type] == "rmlog" {
        elasticsearch {
                hosts => ["xyz.com:9200"]
                manage_template => false
                index => "rich-rmlog-%{+YYYY.MM.dd}"
  }
 }
}

Below is the raw data which logstash filtering..

oradb001,19/05/30,12:38,00-mins,kuller,kuller 193264 0.0 0.0 9248 1228 ? Ss 12:38 0:00 /bin/sh -c /bin/rm -fr /dv/t4users15ri/kuller/tan_mst/kvm/src/kvm12ml/kvm_ref.build ; \?/bin/cp -fr /dv/t4users15ri/kuller/tan_mst/kvm/src/kvm12ml/kvm_ref /dv/t4users15ri/kuller/tan_mst/kvm/src/kvm12ml/kvm_ref.build ; \?chmod +w /dv/t4users15ri/kuller/tan_mst/kvm/src/kvm12ml/kvm_ref.build/nd/Proj/ ; \?cd /dv/t4users15ri/kuller/tan_mst/kvm/src/kvm12ml/kvm_ref.build/nd; \?ND=/dv/t4users15ri/kuller/tan_mst/kvm/src/kvm12ml/natural_docs ; \?export ND; \?csh -f ./gen_nd ; \?cp /dv/t4users15ri/kuller/tan_mst/dvproject/builds/32bit/kvm/src/kvm12ml/additions/sv/lib/libkvmpli.so /dv/t4users15ri/kuller/tan_mst/kvm/src/kvm12ml/additions/sv/lib/libkvmpli.so ; \?cp /dv/t4users15ri/kuller/tan_mst/dvproject/builds/32bit/kvm/src/kvm12ml/distrib/src/dpi/libkvmdpi.so /dv/t4users15ri/kuller/tan_mst/kvm/src/kvm12ml/additions/sv/lib/libkvmdpi.so ; \?echo "if the following command fails look at /dv/t4users15ri/kuller/tan_mst/kvm/src/kvm12*/irun.log for a cause" ; \?echo "package cdns_kvmapi; endpackage" > /dv/t4users15ri/kuller/tan_mst/kvm/src/kvmapi/src/cdns_kvmapi.svp ; echo "touch /dv/t4users15ri/kuller/tan_mst/kvm/src/kvmapi/src/cdns_kvmapi.svp for compiling cdns_kvm_pkg" ; \?/bin/rm -f /dv/t4users15ri/kuller/tan_mst/kvm/src/kvm12ml/additions/sv/files/tcl/kvm_gui_filter.txt ; \?cd /dv/t4users15ri/kuller/tan_mst/kvm/src/kvm12ml ; \?/dv/t4users15ri/kuller/tan_mst/kvm/bin/mk_kvm_filter.pl --kvmhome /dv/t4users15ri/kuller/tan_mst/kvm/src/kvm12ml/distrib > /dv/t4users15ri/kuller/tan_mst/kvm/src/kvm12ml/additions/sv/files/tcl/kvm_gui_filter.txt ; \?mv /dv/t4users15ri/kuller/tan_mst/kvm/src/kvm12ml/additions/sv/files/tcl/kvm_gui_filter.txt /dv/t4users15ri/kuller/tan_mst/kvm/src/kvm12ml/additions/sv/files/tcl/kvm_gui_filter.txt.generated ; \?cp -f /dv/t4users15ri/kuller/tan_mst/kvm/src/kvm12ml/additions/sv/files/tcl/source_kvm_gui_filter.txt /dv/t4users15ri/kuller/tan_mst/kvm/src/kvm12ml/additions/sv/files/tcl/kvm_gui_filter.txt ; \?/bin/rm -f /dv/t4users15ri/kuller/tan_mst/kvm/src/kvm11ml/additions/sv/lib/libkvmpli.so /dv/t4users15ri/kuller/tan_mst/kvm/src/kvm11ml/additions/sv/lib/libkvmdpi.so ; \?cd /dv/t4users15ri/kuller/tan_mst/kvm/src/iregGen ; \?sh /dv/t4users15ri/kuller/tan_mst/kvm/src/iregGen/buildit.sh,/efsroots/4/dv/t4users15ri/kuller/tan_mst/kvm/src/kvm12ml

screen shot attached below..

Is there anything wrong with rmlog.conf ..

any help will be appreciated..

any help and pointer will be appreciated..

Your sincedb path setting for the file input will cause files to be reprocessed on restart. Is that intentional?

@Christian_Dahlqvist, thnx for the response, initially it was #sincedb_path => "/data/registry-1" and i was seeing that duplicacy as well.. so i deliberately commented that for testing and placed `sincedb_path => "/dev/null" , i want to understand is there anything we can tune with.

That setting will most certainly cause duplicates so that need to be changed.

I have changed the sincedb_path to sincedb_path => "/data/registry-1" i still see its creating duplicate docs, i'm not getting any pointer.

Do you have any other files in your config directory, e.g. older versions? Logstash concatenation all files in that directory so data from all inputs will go to all outputs unless you use conditionals.

No, i don't have any other files except the one i posted..

$ cd /etc/logstash/conf.d/
$ ls -ltrh
total 4.0K
-rw-r--r-- 1 root root 1.3K May 31 00:14 rmlog.conf
$

Is there anything wrong or suspicious in the below config?

path => [ "/data/rm_logs/*.txt" ]
start_position => beginning
max_open_files => 64000
sincedb_path => "/data/registry-1"
type => "rmlog"

No, ad far as I can see that looks fine.

then, i will clean-up all the Index and pump them up again starting ELK services.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.