_geoip_lookup_failure though all geoip fields are populated

Hi folks, using ELK 5.6.16.

I'm attempting to geoip iptables logs. I'm seeing geoip fields being populated, but tags field shows _geoip_lookup_failure as well as the tags I apply in my grok.

Input is provided by filebeat with document_type set to dawvpn-iptables.

Filter:

filter {
  if [document_type] == "dawvpn-iptables" {
	grok {
	  match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} kernel: '%{USERNAME:ipt_rule}: 'IN=ens3 OUT= MAC=%{DATA} SRC=%{IP:ipt_src} DST=%{IP:ipt_dst} LEN=%{INT:ipt_len:int} %{GREEDYDATA} PROTO=%{WORD:ipt_proto} SPT=%{INT:ipt_spt} DPT=%{INT:ipt_dpt} %{GREEDYDATA}"
      }
	add_tag => [ "DAWVPN_IPT","DAWVPN_%{ipt_rule}" ]
    }
  date {
    match => [ "syslog_timestamp", "MMM  d HH:mm:ss", "MMM dd HH:mm:ss" ]
    }
  }
  geoip {
	source => "ipt_src"
	target => "geoip"
	database => "/etc/logstash/GeoLite2-City_20171003/GeoLite2-City.mmdb"
	add_field => [ "[geoip][coordinates]", "%{[geoip][longitude]}" ]
	add_field => [ "[geoip][coordinates]", "%{[geoip][latitude]}"  ]
  }
  mutate {
	convert => [ "[geoip][coordinates]", "float"]
  }
}

JSON ouput of parsed event:

{
  "_index": "dawvpn-iptables-2019.08.14",
  "_type": "log",
  "_id": "AWyRxgOrXUAjfYImHJWh",
  "_version": 1,
  "_score": null,
  "_source": {
    "geoip": {
      "timezone": "Europe/Rome",
      "ip": "217.61.20.209",
      "latitude": 43.4167,
      "coordinates": [
        11.8833,
        43.4167
      ],
      "continent_code": "EU",
      "city_name": "Arezzo",
      "country_name": "Italy",
      "country_code2": "IT",
      "country_code3": "IT",
      "region_name": "Province of Arezzo",
      "location": {
        "lon": 11.8833,
        "lat": 43.4167
      },
      "postal_code": "52100",
      "region_code": "AR",
      "longitude": 11.8833
    },
    "offset": 23328,
    "input_type": "log",
    "ipt_dpt": "22",
    "source": "/var/log/filebeat-dawvpn-iptables.log",
    "message": "Aug 13 22:25:35 daw kernel: 'TL0G_SSH: 'IN=ens3 OUT= MAC=00:16:3c:0e:7c:89:00:12:f2:91:b8:00:08:00 SRC=217.61.20.209 DST=xxx.xxx.xxx.xxx LEN=40 TOS=0x00 PREC=0x00 TTL=249 ID=54321 PROTO=TCP SPT=58282 DPT=22 WINDOW=65535 RES=0x00 SYN URGP=0 ",
    "type": "log",
    "ipt_src": "217.61.20.209",
    "tags": [
      "_geoip_lookup_failure",
      "DAWVPN_IPT",
      "DAWVPN_TL0G_SSH"
    ],
    "ipt_dst": "xxx.xxx.xxx.xxx",
    "ipt_proto": "TCP",
    "@timestamp": "2019-08-14T02:25:35.000Z",
    "syslog_hostname": "daw",
    "syslog_timestamp": "Aug 13 22:25:35",
    "@version": "1",
    "beat": {
      "name": "str002",
      "hostname": "str002",
      "version": "5.6.16"
    },
    "host": "str002",
    "ipt_rule": "TL0G_SSH",
    "ipt_spt": "58282",
    "document_type": "dawvpn-iptables",
    "ipt_len": 40
  },
  "fields": {
    "@timestamp": [
      1565749535000
    ]
  },
  "highlight": {
    "geoip.postal_code": [
      "@kibana-highlighted-field@52100@/kibana-highlighted-field@"
    ]
  },
  "sort": [
    1565749535000
  ]
}

There is no _grokparsefailure, the log message is successfully parsed by the grok configuration. The ipt_src field is populated and apparently looked up by geoip based on the contents of the geoip field data. Nothing in logstash logs indicating a problem either.

I'm at a loss to understand why all events are getting tagged _geoip_lookup_failure.
I have a very similar, working filter, using json input instead, and there are no _geoip_lookup_failure tags on those events.

What else can I look at to troubleshoot this?

Thanks

That definitely suggests there is a second geoip filter. Is it possible that path.config points to a directory and there is a backup copy of the filter configuration in another file?

I don't see anything in elasticsearch.yml pertaining to path.config, is there a way to view that with curl/cerebro/kibana?

path.config would be set in logstash.yml, or using -f on the command line, or in pipelines.yml

grep -i path /etc/logstash/logstash.yml 
# ------------ Data path ------------------
path.data: /var/lib/logstash
path.config: /etc/logstash/conf.d

/etc/logstash/conf.d is chef-managed and the only references to geoip in that dir are the two filter confs; the previously mentioned json filter, and this one. Both filters configured to use same geoip db at same location /etc/logstash/GeoLite2-City_20171003/GeoLite2-City.mmdb

I don't have pipelines.yml.

I commented out the database line in the filter, restarted logstash, deleted index pattern and indexes, then re-added the log data, same thing, _geoip_lookup_failure, and populated geoip fields.

I'm stumped.

That appears to be saying there are two geoip filters, so the "other" one is failing.

If you point path.config to a directory such as /etc/logstash/conf.d then logstash concatenates all the files in the directory (all of them -- if you have a core file in there it will happily read it and log errors telling you it is not a valid configuration). The events are read from all the inputs defined in the files, sent through all the filters defined in the files, and written to all of the outputs. Misunderstanding this is very common in folks new to logstash.

Should I be seeing multiple entries in logstash log for same geoip db?

[2019-08-14T19:17:34,453][INFO ][logstash.outputs.elasticsearch] New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>["//str002:9200"]}
[2019-08-14T19:17:34,538][INFO ][logstash.filters.geoip   ] Using geoip database {:path=>"/usr/share/logstash/vendor/bundle/jruby/1.9/gems/logstash-filter-geoip-4.3.1-java/vendor/GeoLite2-City.mmdb"}
[2019-08-14T19:17:34,936][INFO ][logstash.filters.geoip   ] Using geoip database {:path=>"/usr/share/logstash/vendor/bundle/jruby/1.9/gems/logstash-filter-geoip-4.3.1-java/vendor/GeoLite2-City.mmdb"}
[2019-08-14T19:17:35,321][INFO ][logstash.pipeline        ] Starting pipeline {"id"=>"main", "pipeline.workers"=>16, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>5, "pipeline.max_inflight"=>2000}

Not a fix, but I have discovered a behaviour.
I can have loaded one single conf with geoip defined.
When both are loaded, data filtered by either parses the geoip data AND tags each event _geoip_lookup_failure.

So the question is now, how can I have multiple filters use geoip?

You should get two messages from geoip when you have two geoip filters defined.

You are still missing that you only have a single configuration, which comprises the set of files in /etc/logstash/conf.d. The files in that directory are not independent, they are merged together. Every event goes through every filter defined in every file in that directory (unless you use conditionals to prevent that).

Yes, I'm aware confs are AND'd together and order matters, so is the best practise to remove the individual geoip section from each of the two filters, create a single geoip only filter that uses conditionals to process the IPs parsed from both filters?

Ow! I would argue that this is a bug. Most filters are a no-op if the source field does not exist. However, the geoip filter takes a different approach. If the field lookup fails then the java code returns false. The ruby code either decorates the event or tags it for failure. There is no no-op path. I think the ruby code should duplicate the source existence check.

So was this event processed by both filters?

Maybe do cat /etc/logstash/conf.d/* > full-config.yml and post that file so we can see the total config.

(link removed)

Here's the full config.

In each section you have:

if <test> { 
  grok {}
}
  geoip {}

Would moving the geoip inside the if {..} fix the problem?

if <test> {
   grok {}
   geoip {}
}

The multiple pipelines in you future logstash 6.0 help clean up this logic :slight_smile:

I didn't know you could use multiple filter {..} sections either, but it seems to be OK.

At first I had the geoip sections inside the ifs, but that didnt seem to make a difference and produced the same result.

I have 22 conf files in total, and cat'd them all together to produce that full-config.

01-input-01dns1-filebeat.conf     11-filter-02-iptables_geoip.conf     49-output-mail_01-grokparsefailure.conf
01-input-02dns2-filebeat.conf     11-filter-02-iptables_no-geoip.conf  49-output-mail_02-ups.conf
01-input-03-nagios-filebeat.conf  11-filter-03-ip6tables.conf          49-output-mail_03-oom.conf
01-input-04-kibana-filebeat.conf  11-filter-03-named_querylog.conf     49-output-mail_04-nfs.conf
11-filter-01-kibana.conf          11-filter-04-wireless.conf           50-output.conf
11-filter-01-nagios.conf          11-filter-05-dhcpd.conf              51-output-stdout.conf
11-filter-01-nginx-geoip.conf     11-filter-06-ddns.conf               52-output-graphite.conf
11-filter-01-syslog.conf          12-filter-metrics.conf

So this is what I ended up doing to make this work.

In the two filter confs that had geoip configurations, I commented those sections out, changed the var name that parsed the ip in one filter to match the name used in the other filter, and created a geoip only conf that gets loaded after the two filters.

Addendum found next day:
Had to wrap the geoip filter in an if, every single event processed by logstash was being tagged _geoip_lookup_failure.