Cannot search JSON logs in Kibana

Hi. I'm sending custom JSON logs to ES/Kibana v 7.8.1 using filebeat. My filebeat config is:

    cloud.auth: <username:password>
    cloud.id: <cloud ID>
    filebeat.inputs:
    - enabled: true
      json.add_error_key: true
      json.keys_under_root: false
      paths:
      - /var/log/myapp/*.log
      type: log
    filebeat.modules:
    - auth:
        enabled: false
      module: system
      syslog:
        enabled: false
    setup.template.settings:
      index.number_of_shards: 3

I can see all the data just fine in Kibana. However it seems that I can't search on any JSON log data.

E.g. I can just type the hostname in the search bar, and it will show logs with that host and host highlighted correctly. But if I pick any string from a log message (in the "json.msg" field), nothing is returned.

With our config, messages are sent in the "json.msg" field. I can see in the Kibana sidebar, and in the Kibana "filebeat-*" index pattern editor, that fields like "json.msg" are configured as text and marked as searchable, so what am I missing?

I did click to refresh the Kibana field list, but same problem after.

I'm obviously not an expert in this! But strange thing is, I set up an ELK stack on v6.x a couple of years ago, and it works fine on that install. I'm just updating our filebeat clients to point to a new 7.x install, so surprised things aren't working. Perhaps I did something on original server I can't recall now!

I was hoping that any JSON properties in our logs with be dynamically picked up and indexed for searching. How can I make that happen?

Thanks

Hi @bc_andrew
The Filebeat log input should automatically decode JSON log files but with a single clause: Filebeat processes the logs line by line, so the JSON decoding only works if there is one JSON object per line, we you in that case?
https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-input-log.html#filebeat-input-log-config-json

Hey @markov00. Yes, there's only one serialized JSON object per line.

Also, the log format hasn't changed from the one we are currently still pushing to our 6.x version of ELK stack, and it's working great there.

And just to confirm, in the 7.x Kibana, I see all the fields in each log just fine. I just can't search on them :slight_smile:

Any other suggestions?

Thanks

And just to clarify: I can filter ok. E.g. I can click the "Filter for value" option beside the "json.msg" field, and that works. But if I just type a single word, say, from the message in the search box, there are no results.

So is it something to do with full text indexing? But I would have thought that's what the "searchable" value means in the Kibana index editor

Probably the automatic mappings of your JSON didn't add a full text searchable field for the msg field. Could you please post your index mappings: GET /yourindex/_mapping
If your fields mapping are defined with the type keyword they can be only filtered or searched using the whole text. They should include the "type": "text" on the field you want to search as full text

Hi @markov00. Ok, I figured out how to get that info and yes, it is defined as a keyword:

        ...
        "json" : {
          "properties" : {
            "hostname" : {
              "type" : "keyword",
              "ignore_above" : 1024
            },
            "level" : {
              "type" : "long"
            },
            "levelStr" : {
              "type" : "keyword",
              "ignore_above" : 1024
            },
            "msg" : {
              "type" : "keyword",
              "ignore_above" : 1024
            },
            "name" : {
              "type" : "keyword",
              "ignore_above" : 1024
            },
            "pid" : {
              "type" : "long"
            },
            "time" : {
              "type" : "keyword",
              "ignore_above" : 1024
            },
            "v" : {
              "type" : "long"
            }
          }
        },
        ...

How can I change that so that any index created from filebeat data will have that setting? Thanks

EDIT: ah, is that what the json.message_key setting is for in the filebeat config file? Should I be setting that to "msg"? (but, again, I'm bit confused how this works with our current system then)

I also checked the mappings in our current 6.x system and they look like this (some extra props - for example err nested object):


        "json": {
          "properties": {
            "code": {
              "type": "long"
            },
            "err": {
              "properties": {
                "code": {
                  "type": "text",
                  "fields": {
                    "keyword": {
                      "type": "keyword",
                      "ignore_above": 256
                    }
                  }
                },
                "message": {
                  "type": "text",
                  "fields": {
                    "keyword": {
                      "type": "keyword",
                      "ignore_above": 256
                    }
                  }
                },
                "name": {
                  "type": "text",
                  "fields": {
                    "keyword": {
                      "type": "keyword",
                      "ignore_above": 256
                    }
                  }
                },
                "stack": {
                  "type": "text",
                  "fields": {
                    "keyword": {
                      "type": "keyword",
                      "ignore_above": 256
                    }
                  }
                }
              }
            },
            "fbtrace_id": {
              "type": "text",
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            },
            "hostname": {
              "type": "text",
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            },
            "is_transient": {
              "type": "boolean"
            },
            "level": {
              "type": "long"
            },
            "levelStr": {
              "type": "text",
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            },
            "message": {
              "type": "text",
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            },
            "msg": {
              "type": "text",
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            },
            "name": {
              "type": "text",
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            },
            "pid": {
              "type": "long"
            },
            "time": {
              "type": "date"
            },
            "type": {
              "type": "text",
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            },
            "v": {
              "type": "long"
            }
          }
        },

Those mappings look very different from the ones in our 7.x system, even though we're using basically the same filebeat.yml config file for the agent.

EDIT: aaah, ok previously the filebeat.yml was using output.logstash and pointing at our 6.x ELK server. We are now moving to Elastic Cloud hosted version, so I've just used cloud.id as suggested in docs. So I guess I'm skipping the whole Logstash part, right?? And I guess that's why the above mappings look so different - they are being managed by logstash somehow?

I didn't consider that I actually needed Logstash, but should I be using that?

My main concern is if we add new props to our JSON log in future - I want to make sure we will be able to search them correctly, and not have to remember to manually update our filebeat config in some other part of the system.

Sorry for all the noob questions, but I'm feeling a bit lost in all this! Appreciate any advice.

You are right, you are skipping the logstash part from your previous configuration that probably have a different set of mappings.

Before excluding completely logstash from your ingest pipeline, you first need to be sure that the 6.x logstash configuration isn't applying any processing, enrichment on your log files, like adding computed fields based on others (I can see for example the fbtrace_id that is not on your 7.x mappings that can possibly be extracted by logstash, I'm just assuming this)

If after checking the logstash configuration you don't find anything in particular that alter your log, what you can do is to update the current mappings with a mapping that looks as close as possible to your current JSON log structure.

The way to update the current mappings to reflect the old one depends on whether you need to change the mapping for historical data or just for future data.
For future data (stored in new indices) it's sufficient just to put the new Index mapping in the cluster.
If you need a new mapping also for the historical data you will have to reindex the old data into new indices to get the new mapping.

If you are concerned about new props in the future, you can check and configure how ES is detecting the type of a new field using dynamic templates that allows you to always map string fields as text and keyword for example

If you need more help on that, you can open a specific post on Beats forum or ask the Cloud support to help you on that

Hey @markov00. Thanks for the follow up, and the patience!

I finally realised that logstash config is stored locally on our ELK box (rather than in ES somewhere) and in the /etc/logstash/conf.d directory I see 3 files:

02-beats-input.conf:

input {
  beats {
    port => 5044
    codec => "json"
    ssl => true
    ssl_certificate => "/etc/pki/tls/certs/logstash-forwarder.crt"
    ssl_key => "/etc/pki/tls/private/logstash-forwarder.key"
  }
}

10-syslog-filter.conf: this one just applies a filter for syslog type records, so skipping this.

30-elasticsearch-output.conf:

output {
  elasticsearch {
    hosts => ["localhost:9200"]
    sniffing => true
    manage_template => false
    index => "%{[@metadata][beat]}-%{+YYYY.MM.dd}"
    document_type => "%{[@metadata][type]}"
  }
}

What is meaning of the json codec? Does this magically create and update all the required ES index mappings? (Update: seems unlikely...)

So is that logstash setup basically useless? I should ignore logstash altogether?

One thing I'm still confused about is where the mapping is happening on our current 6.x box.

I dumped all the index templates from our 6.x box using GET _template and I can't see any mention of a json or msg prop. Yet, the result of say GET filebeat-2020.03.22/_mapping shows a mapping for them:

{
  "filebeat-2020.03.22" : {
    "mappings" : {
      "doc" : {
        "properties" : {
          "@timestamp" : {
            "type" : "date"
          },
          ....
           "json" : {
            "properties" : {
              "err" : {
                "properties" : {
                  "code" : {
                    "type" : "text",
                    "fields" : {
                      "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                      }
                    }
                  },
                  "message" : {
                    "type" : "text",
                    "fields" : {
                      "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                      }
                    }
                  },
                  "name" : {
                    "type" : "text",
                    "fields" : {
                      "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                      }
                    }
                  },
                  "stack" : {
                    "type" : "text",
                    "fields" : {
                      "keyword" : {
                        "type" : "keyword",
                        "ignore_above" : 256
                      }
                    }
                  }
                }
              },
              "hostname" : {
                "type" : "text",
                "fields" : {
                  "keyword" : {
                    "type" : "keyword",
                    "ignore_above" : 256
                  }
                }
              },
              "level" : {
                "type" : "long"
              },
              "levelStr" : {
                "type" : "text",
                "fields" : {
                  "keyword" : {
                    "type" : "keyword",
                    "ignore_above" : 256
                  }
                }
              },
              "msg" : {
                "type" : "text",
                "fields" : {
                  "keyword" : {
                    "type" : "keyword",
                    "ignore_above" : 256
                  }
                }
              },
              "name" : {
                "type" : "text",
                "fields" : {
                  "keyword" : {
                    "type" : "keyword",
                    "ignore_above" : 256
                  }
                }
              },
              "pid" : {
                "type" : "long"
              },
              "time" : {
                "type" : "date"
              },
              "v" : {
                "type" : "long"
              }
            }
          },
          ....

And looking at the index patterns from GET _cat/templates?v:

name                          index_patterns             order      version
kibana_index_template:.kibana [.kibana]                  0          
.ml-state                     [.ml-state*]               0          6081099
.monitoring-es                [.monitoring-es-6-*]       0          6070299
logstash-index-template       [.logstash]                0          
.ml-meta                      [.ml-meta]                 0          6081099
.ml-notifications             [.ml-notifications]        0          6081099
.ml-anomalies-                [.ml-anomalies-*]          0          6081099
.monitoring-beats             [.monitoring-beats-6-*]    0          6070299
.management-beats             [.management-beats]        0          67000
filebeat-6.0.0                [filebeat-6.0.0-*]         1          
.monitoring-logstash          [.monitoring-logstash-6-*] 0          6070299
.ml-config                    [.ml-config]               0          6081099
.watches                      [.watches*]                2147483647 
security-index-template       [.security-*]              1000       
security_audit_log            [.security_audit_log*]     1000       
.monitoring-kibana            [.monitoring-kibana-6-*]   0          6070299
.triggered_watches            [.triggered_watches*]      2147483647 
.watch-history-9              [.watcher-history-9*]      2147483647 
.monitoring-alerts            [.monitoring-alerts-6]     0          6070299
.kibana_task_manager          [.kibana_task_manager]     0          6081099

It doesn't seem like there is an index pattern that actually matches an index name like filebeat-2020.03.22! So how does it know about the json prop mappings? Does it just fallback to looking at the previous index? If so, I guess I must have followed some guide to configure an initial index correctly at some point.

Sorry for all the questions, I just really want to understand how it all plugs together before I proceed.

Thanks!

Just to follow up in case anyone else has similar confusion... I finally realised that the 7.x version of the filebeat client creates a very comprehensive filebeat index template, and in the dynamic templates settings it has the following:

      {
        "strings_as_keyword": {
          "mapping": {
            "ignore_above": 1024,
            "type": "keyword"
          },
          "match_mapping_type": "string"
        }
      }

So that's at least why unknown strings were being mapped as keywords. It seems the behaviour in our 6.x ELK install just used dynamic field mappings which always mapped unknown strings as text but also created a 256-limited keyword multi-field.

I was really uncomfortable just manually pushing up a one-off index template update to fix that, but then I discovered that there is a setup.template.append_fields setting in the filebeat.yml config file which lets you add your own custom fields to the predefined ones in the (huge) default "fields.yml" list. So now at least I can populate that setting via our Salt Stack config and have a reproducible setup for the future.

None of this was obvious to me as someone who rarely has to set this up, so it involved a very deep dive on how ES and filebeat work. Hope this helps others in same position.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.