Logstash logs duplicates in different indexes despite else if conditions


(Claudio) #1

Hi,

I'm logging Apache logs using the file input, and my Logstash is logging events both to the logstash-yyyy.mm.dd index and the custom one I specify (sometimes also creating duplicates in one of the indexes).

Here's the relevant config:

output {
  if "localhost" in [tags] {
    elasticsearch {
      hosts => ["localhost:9200"]
      index => "logs-localhost-%{+YYYY.MM.dd}"
    }
  }
  else if [log_custom_index] {
    elasticsearch {
      hosts => ["localhost:9200"]
      index => "logs-%{log_custom_index}-%{+YYYY.MM.dd}"
    }
  }
  else {
    elasticsearch {
      hosts => ["localhost:9200"]
    }
  }
}

When one between the "localhost" tag or "log_custom_index" field is found, a copy of the log is always sent to the logstash-yyyy.mm.dd index

Logstash is running as a service on an up-to-date Debian Stretch, installed via the Elastic apt repository

root@Server /etc/logstash/conf.d # dpkg -l | grep -E "elastic|logstash|kibana"
ii  elasticsearch                   6.4.0                             all          Elasticsearch is a distributed RESTful search engine built for the cloud. Reference documentation can be found at https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html and the 'Elasticsearch: The Definitive Guide' book can be found at https://www.elastic.co/guide/en/elasticsearch/guide/current/index.html
ii  elasticsearch-curator           5.5.4                             amd64        Have indices in Elasticsearch? This is the tool for you!\n\nLike a museum curator manages the exhibits and collections on display, \nElasticsearch Curator helps you curate, or manage your indices.
ii  kibana                          6.4.0                             amd64        Explore and visualize your Elasticsearch data
ii  logstash                        1:6.4.0-1                         all          An extensible logging pipeline

root@Server /etc/logstash/conf.d # apt-cache policy logstash
logstash:
  Installed: 1:6.4.0-1
  Candidate: 1:6.4.0-1
  Version table:
 *** 1:6.4.0-1 500
        500 https://artifacts.elastic.co/packages/6.x/apt stable/main amd64 Packages
        100 /var/lib/dpkg/status

root@Server /etc/logstash/conf.d # apt-cache policy elasticsearch
elasticsearch:
  Installed: 6.4.0
  Candidate: 6.4.0
  Version table:
 *** 6.4.0 500
        500 https://artifacts.elastic.co/packages/6.x/apt stable/main amd64 Packages
        100 /var/lib/dpkg/status

Things I tried:
Both suggestions from Log duplicated in multiple indexes

  1. "Try without the else clause"
    If I comment out the else (and then restart logstash), logs are still sent to logstash-yyyy.mm.dd.
    This caught me by surprise
  2. "you should not see any log messages with this text in the log line retrying failed action with response code"
    I looked into both logstash-plain.log and the elasticsearch log; can't find it in either

Making sure that my if-else worked fine (from Problem with output elasticsearch data duplicate on index)

To do this, I redirected all the outputs to rubydebug, used stdin as the input to feed it log lines, and made sure that:

  • with my normal configuration, the events were only parsed once and the tags that were supposed to be there, were.
  • removing the "else" from "else if" the logs were parsed multiple times (which is consistent with what seems to be happening to me)

My filters (including the mutate that checks cidr for the "localhost" tag) work fine: the log lines stored in the logstash-yyyy.mm.dd index contain the tags/fields that are added manually, so this seems an issue with either the elasticsearch output sending the logs where it's not supposed to, which I think is probable, because the other cause could be a bug in elasticsearch itself storing the logs twice on receipt. I am unlikely to believe this because some (quick) testing with custom indexes (test-something) did not reproduce the issue.

Here's the full (redacted) config:

input { 
  file {
    type => "apache2-accesslog"
    path => [ "/var/log/apache2/example.org.access.log", "/var/log/apache2/example.org.access.log.1" ]
    tags => [ "apache2", "example.org"]
  }
  file {
    type => "apache2-accesslog"
    path => [ "/var/log/apache2/elastic.example.org.access.log", "/var/log/apache2/elastic.example.org.access.log.1" ]
    tags => [ "apache2", "elastic.example.org"]
    add_field => { "log_custom_index" => "elasticsearch" }
  }
  file {
    type => "apache2-accesslog"
    path => [ "/var/log/apache2/access.log", "/var/log/apache2/access.log.1","/var/log/apache2/other_vhosts_access.log" , "/var/log/apache2/other_vhosts_access.log.1" ]
    tags => [ "apache2", "other_vhosts"]
  }
}

filter {
  grok {
    match => { "message" => [ "%{COMBINEDAPACHELOG}" ] }
  }
  date {
    match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
  }
  if [clientip] {
    cidr {
      add_tag => [ "localhost" ]
      address => [ "%{clientip}" ]
      network => ["127.0.0.0/8", "xxx.xxx.xxx.xxx/32", "yyy.yyy.yyy.yyy/32", "2a01:xxx:xxx:xxx::2/64", "fe80::/10", "::1/128" ]
    }
  }
}

output {
  if "localhost" in [tags] {
    elasticsearch {
      hosts => ["localhost:9200"]
      index => "logs-localhost-%{+YYYY.MM.dd}"
    }
  }
  else if [log_custom_index] {
    elasticsearch {
      hosts => ["localhost:9200"]
      index => "logs-%{log_custom_index}-%{+YYYY.MM.dd}"
    }
  }
  else {
    elasticsearch {
      hosts => ["localhost:9200"]
    }
  }
}

Any ideas on how to fix it?

Thanks!


(Magnus Bäck) #2

Make sure you don't have any extra files in your conf.d directory (or wherever you store your config files). Depending on the Logstash version all files in the directory may be read, including any backup files or similar.

If you bump the loglevel to DEBUG I believe Logstash will log the exact configuration it loads. Make sure that's what you expect it to be.


(Claudio) #3

Hi Magnus,

I've double checked the config and while yes, I have multiple config files, one is for rsyslog files and one for fail2ban.
Neither is loading anything with "*" so I don't see any way they might be reading the same log twice.

failban.conf

input {
  file {
    type => "fail2ban"
    path => [ "/var/log/fail2ban.log" ]
  }
}

logstash-rsyslog.conf

input {
  file {
    type => "syslog-base"
    path => ["/var/log/syslog","/var/log/syslog.1"]
    tags => "syslog"
  }
  file {
    type => "syslog-advanced"
    path => ["/var/log/auth.log","/var/log/auth.log.1"]
    tags => "auth"
  }
  file {
    type => "syslog-base"
    path => ["/var/log/daemon.log","/var/log/daemon.log.1"]
    tags => "daemon"
  }
  file {
    type => "syslog-base"
    path => ["/var/log/messages","/var/log/messages.1"]
    tags => "messages"
  }
  file {
    type => "syslog-base"
    path => ["/var/log/kern.log","/var/log/kern.log.1"]
    tags => "kern"
  }
  file {
    type => "syslog-base"
    path => ["/var/log/mail.err","/var/log/mail.err.1","/var/log/mail.info","/var/log/mail.info.1","/var/log/mail.log","/var/log/mail.log.1","/var/log/mail.warn","/var/log/mail.warn.1"]
    tags => "mail_logs"
  }
  file {
    type => "syslog-base"
    path => ["/var/log/kern.log","/var/log/kern.log.1"]
    tags => "daemon"
  }
}

I ran logstash manually with the debug loglevel switch; I can still see events that are tagged "localhost" and events with "log_custom_index: elasticsearch" in my logstash-yyyy.mm.dd index.

This is the (again, redacted) one capture from the debug log. I've checked and there are no duplicates.

[DEBUG] 2018-09-15 17:16:23.917 [[main]<file] grow - read_to_eof: get chunk
[DEBUG] 2018-09-15 17:16:23.917 [[main]<file] file - Received line {:path=>"/var/log/apache2/elastic.example.org.access.log", :text=>"X.X.X.X - metricbeat [15/Sep/2018:17:16:23 +0200] \"POST /es-endpoint/_bulk HTTP/1.1\" 200 479 \"-\" \"Go-http-client/1.1\""}
[DEBUG] 2018-09-15 17:16:23.918 [[main]<file] decorators - inputs/LogStash::Inputs::File: adding value to field {"field"=>"log_custom_index", "value"=>["elasticsearch"]}
[DEBUG] 2018-09-15 17:16:23.918 [[main]<file] decorators - inputs/LogStash::Inputs::File: adding tag {"tag"=>"apache2"}
[DEBUG] 2018-09-15 17:16:23.918 [[main]<file] decorators - inputs/LogStash::Inputs::File: adding tag {"tag"=>"elastic.example.org"}
[DEBUG] 2018-09-15 17:16:24.019 [Ruby-0-Thread-13@[main]>worker2: :1] pipeline - filter received {"event"=>{"tags"=>["apache2", "elastic.example.org"], "path"=>"/var/log/apache2/elastic.example.org.access.log", "type"=>"apache2-accesslog", "message"=>"X.X.X.X - metricbeat [15/Sep/2018:17:16:23 +0200] \"POST /es-endpoint/_bulk HTTP/1.1\" 200 479 \"-\" \"Go-http-client/1.1\"", "@version"=>"1", "host"=>"ES_HOST_NAME", "log_custom_index"=>"elasticsearch", "@timestamp"=>2018-09-15T15:16:23.918Z}}
[DEBUG] 2018-09-15 17:16:24.019 [Ruby-0-Thread-13@[main]>worker2: :1] grok - Running grok filter {:event=>#<LogStash::Event:0xdc41925>}
[DEBUG] 2018-09-15 17:16:24.020 [Ruby-0-Thread-13@[main]>worker2: :1] grok - Event now:  {:event=>#<LogStash::Event:0xdc41925>}
[DEBUG] 2018-09-15 17:16:24.023 [Ruby-0-Thread-13@[main]>worker2: :1] cidr - Checking IP inclusion {:address=>#<IPAddr: IPv4:X.X.X.X/255.255.255.255>, :network=>#<IPAddr: IPv4:127.0.0.0/255.0.0.0>}
[DEBUG] 2018-09-15 17:16:24.023 [Ruby-0-Thread-13@[main]>worker2: :1] cidr - Checking IP inclusion {:address=>#<IPAddr: IPv4:X.X.X.X/255.255.255.255>, :network=>#<IPAddr: IPv4:Y.Y.Y.Y/255.255.255.255>}
[DEBUG] 2018-09-15 17:16:24.023 [Ruby-0-Thread-13@[main]>worker2: :1] cidr - Checking IP inclusion {:address=>#<IPAddr: IPv4:X.X.X.X/255.255.255.255>, :network=>#<IPAddr: IPv4:Z.Z.Z.Z/255.255.255.255>}
[DEBUG] 2018-09-15 17:16:24.024 [Ruby-0-Thread-13@[main]>worker2: :1] cidr - Checking IP inclusion {:address=>#<IPAddr: IPv4:X.X.X.X/255.255.255.255>, :network=>#<IPAddr: IPv6:G:H:I:J:0000:0000:0000:0000/ffff:ffff:ffff:ffff:0000:0000:0000:0000>}
[DEBUG] 2018-09-15 17:16:24.024 [Ruby-0-Thread-13@[main]>worker2: :1] cidr - Checking IP inclusion {:address=>#<IPAddr: IPv4:X.X.X.X/255.255.255.255>, :network=>#<IPAddr: IPv6:fe80:0000:0000:0000:0000:0000:0000:0000/ffff:ffff:ffff:ffff:0000:0000:0000:0000>}
[DEBUG] 2018-09-15 17:16:24.025 [Ruby-0-Thread-13@[main]>worker2: :1] cidr - Checking IP inclusion {:address=>#<IPAddr: IPv4:X.X.X.X/255.255.255.255>, :network=>#<IPAddr: IPv6:0000:0000:0000:0000:0000:0000:0000:0001/ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff>}
[DEBUG] 2018-09-15 17:16:24.026 [Ruby-0-Thread-13@[main]>worker2: :1] pipeline - output received {"event"=>{"bytes"=>"479", "ident"=>"-", "verb"=>"POST", "message"=>"X.X.X.X - metricbeat [15/Sep/2018:17:16:23 +0200] \"POST /es-endpoint/_bulk HTTP/1.1\" 200 479 \"-\" \"Go-http-client/1.1\"", "host"=>"ES_HOST_NAME", "referrer"=>"\"-\"", "httpversion"=>"1.1", "@timestamp"=>2018-09-15T15:16:23.000Z, "tags"=>["apache2", "elastic.example.org"], "request"=>"/es-endpoint/_bulk", "path"=>"/var/log/apache2/elastic.example.org.access.log", "clientip"=>"X.X.X.X", "agent"=>"\"Go-http-client/1.1\"", "timestamp"=>"15/Sep/2018:17:16:23 +0200", "type"=>"apache2-accesslog", "@version"=>"1", "log_custom_index"=>"elasticsearch", "auth"=>"metricbeat", "response"=>"200"}}
[DEBUG] 2018-09-15 17:16:24.918 [[main]<file] grow - read_to_eof: get chunk

And here's what I see from Kibana:

The logstash index


And the elasticsearch index

Please let me know if you need any additional information,

Thanks!


(Magnus Bäck) #4

Where are your elasticsearch outputs defined? The files you posted now only include the inputs.

My crystal ball tells me that you have both filters and outputs in both failban.conf and logstash-rsyslog.conf, in which case you need to consider that Logstash concatenates all configuration files for a given pipeline. Unless guarded by conditionals all events from all inputs will be processed by all filters and all outputs.


(Claudio) #5

Whohoo, editing again because I can't use my brain and didn't remove a time filter. Sorry about this.

You, sir, are a wizard, and I can testify to the accuracy of your crystal ball.

All my outputs read

output {
    elasticsearch { hosts => ["localhost:9200"] }
}

whereas now they all read

output {
  if [type] == "this_particular_type" {
    elasticsearch { hosts => ["localhost:9200"] }
  }
}

and my issue is gone.

Final question: would the correct approach (i.e. the one ES developers intended) be having separate files for inputs, filters and outputs?
Something like:

  • inputs.conf
  • filters-fail2ban.conf (with conditionals to filter the filters by type)
  • filters-generic.conf
  • outputs.conf

As far as I can tell from the docs (Accessing Event Data and Fields in the Configuration -> Conditionals) I cannot wrap the entire config file in a conditional, so this seems like the obvious solution, but I don't know if there are any drawbacks to it.

Thanks!


(Magnus Bäck) #7

There's no single correct approach here, but personally I use basically the same config file layout that you do. The only difference is that I prefix the filter config files with a two-digit integer so that I can control the order in which the filters apply (e.g. 10-early-filters.conf, 50-random-filters.conf, and 90-late-filters.conf).


(Claudio) #8

Makes sense, thanks again for the help and the advice!


(system) #9

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.