Filtering on beats fields issue


(Andy) #1

Fairly new to ELK Stack so please bear with me :slight_smile:

I have a single beats.conf file in Logstash at the moment for a new deployment and I thought it was all working fine, however when trying to make some changes earlier I noticed my filtering hasn't been working correctly. I have syslog, secure and trying to push apache logs through, so I have my filter setup with if and else if based on a field type I have defined in my filebeat config. EG the below.

My filebeat.yml

- input_type: log
  paths:
    - /var/log/messages*
    - /var/log/syslog*
  ignore_older: 1h
  fields:
      logtype: syslog_data

Then in logstash beats.conf

filter {
  if [fields][log_type] =~ "syslog_data" {

That all fails but parses out correctly because of my final else statement.

BUT, if I change the filebeat.yml to

- input_type: log
  paths:
    - /var/log/messages*
    - /var/log/syslog*
  ignore_older: 1h
  document_type: syslog
  fields:
      logtype: syslog_data

and my beats.conf filter to

filter {
  if [type] == "syslog" {

Then it all works? What I am I doing wrong? I thought fields was the best type to use as I get deprecated warnings for document_type? If I can filter on my field type then I think everything should work for me?


(Magnus Bäck) #2

What does an example event look like? Copy/paste from Kibana's JSON tab or use a stdout { codec => rubydebug } output.


(Tat Dat Pham) #3

Why do u dont use document_type: syslog_data
for this

filter {
  if [type] =~ "syslog_data" {

(Andy) #4

If I use if [type] == "syslog" in my logstash config then I get this in kibana's json, which is correct as it means logstash is filtering it correctly. This gives me fields in Kibana for system.syslog.hostname .message .program etc

{
  "_index": "filebeat-2017.08.07",
  "_type": "syslog",
  "_id": "AV2-5IgN9RQeXrDThLGh",
  "_version": 1,
  "_score": null,
  "_source": {
    "@timestamp": "2017-08-07T22:50:22.000Z",
    "system": {
      "syslog": {
        "hostname": "elk-stack-client-1",
        "program": "dbus-daemon",
        "message": "dbus[570]: [system] Activating service name='org.freedesktop.problems' (using servicehelper)",
        "timestamp": "Aug  8 10:50:22"
      }
    },
    "offset": 152792,
    "@version": "1",
    "input_type": "log",
    "beat": {
      "hostname": "elk-stack-client-1",
      "name": "elk-stack-client-1",
      "version": "5.5.1"
    },
    "host": "elk-stack-client-1",
    "source": "/var/log/messages",
    "type": "syslog",
    "fields": {
      "logtype": "syslog_data",
      "env": "Development"
    },
    "tags": [
      "Auckland NZ",
      "beats_input_codec_plain_applied"
    ]
  },
  "fields": {
    "@timestamp": [
      1502146222000
    ]
  },
  "sort": [
    1502146222000
  ]
}

But if I use if [fields][log_type] == "syslog_data" in logstash then I get the below, which is incorrect because its not filtering it to is hitting the final else and not getting parsed. This doesn't give me those system.syslog.program fields etc so everything is just in the message.

{
  "_index": "filebeat-2017.08.07",
  "_type": "syslog",
  "_id": "AV2-6M5K9RQeXrDThLHE",
  "_version": 1,
  "_score": null,
  "_source": {
    "@timestamp": "2017-08-07T22:55:09.371Z",
    "offset": 154490,
    "@version": "1",
    "input_type": "log",
    "beat": {
      "hostname": "elk-stack-client-1",
      "name": "elk-stack-client-1",
      "version": "5.5.1"
    },
    "host": "elk-stack-client-1",
    "source": "/var/log/messages",
    "message": "Aug  8 10:55:02 elk-stack-client-1 accounts-daemon: (accounts-daemon:560): GLib-GIO-CRITICAL **: g_dbus_interface_skeleton_unexport: assertion 'interface_->priv->connections != NULL' failed",
    "type": "syslog",
    "fields": {
      "logtype": "syslog_data",
      "env": "Development"
    },
    "tags": [
      "Auckland NZ",
      "beats_input_codec_plain_applied",
      "_geoip_lookup_failure"
    ]
  },
  "fields": {
    "@timestamp": [
      1502146509371
    ]
  },
  "sort": [
    1502146509371
  ]
}

(Andy) #5

Why do u dont use document_type: syslog_data

Because it says its deprecated, plus when i try that it doesn't filter out out with syslog_data or anything custom I put in. I can only get it to filter with Syslog so I assume its looking for a pre-defined value as opposed to fields.

FWIW I also tried filebeat modules this morning as that would also achieve what I want, but no joy their either as I can get it to send data to elasticsearch and can see the indexes, but Kibana doesn't show any data even though I create the filebeat index so going to keep working on this method.


(Magnus Bäck) #6

But if I use if [fields][log_type] == “syslog_data” in logstash then I get the below, which is incorrect because its not filtering it to is hitting the final else and not getting parsed.

That's because the field is actually named [fields][logtype].


(Andy) #7

That’s because the field is actually named [fields][logtype].

Ahhhhh!! I can't believe I missed that lol. Thanks so much, as no matter how much time I went over it I couldn't see what I was doing wrong and just assumed it was a syntax issue and not a PEBKAC issue with me :slight_smile:
Changed it this morning and logs are being filtered correctly now. Much appreciated!


(system) #8

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.