JSON log, only root level fields being indexed


#1

The logfile contains json log entries like this:

{"priority":"INFO","messageType":"HTTP_RESPONSE","message":{"type":"HTTP_RESPONSE","name":"Create User","status":"200","reason":"OK","response":"{id=1234, name=My Name}"}}

The root level fields are being dynamically indexed, how do I get the fields that are part of the "message" object to also be indexed ?

It seems that the "message" gets treated as "text" type in the Elasticsearch index mapping (see below):

"message": {
  "type": "text",
  "fields": {
    "keyword": {
      "type": "keyword",
      "ignore_above": 256
    }
  }
},

Filebeat is setup this way:

filebeat.inputs:
- type: log
  enabled: true
  paths:
    - C:\logs\*
  format: json
  json.keys_under_root: true
  json.add_error_key: true
  fields:
     index:  test

output.logstash:
  hosts: ["localhost:5000"]

Logstash is setup like this:

input {
        beats {
                port => 5000
        }
}

output {
        elasticsearch {
                hosts => "elasticsearch:9200"
                user => elastic
                password => changeme
                index => "%{[fields][index]}-%{+YYYY.MM.dd}"
        }
}

Filebeat published this event to Logstash:

2018-06-29T13:09:36.968+0200    DEBUG   [publish]       pipeline/processor.go:291       Publish event: {
  "@timestamp": "2018-06-29T11:09:36.967Z",
  "@metadata": {
    "beat": "filebeat",
    "type": "doc",
    "version": "6.3.0"
  },
  "prospector": {
    "type": "log"
  },
  "beat": {
    "hostname": "myhost",
    "version": "6.3.0",
    "name": "myhost"
  },
  "message": {
    "reason": "OK",
    "status": "200",
    "type": "HTTP_RESPONSE",
    "name": "Create User",
    "response": "{id=1234, name=My Name}"
  },
  "priority": "INFO",
  "input": {
    "type": "log"
  },
  "host": {
    "name": "myhost"
  },
  "messageType": "HTTP_RESPONSE",
  "fields": {
    "index": "test"
  },
  "source": "C:\\logs\\test.log"
}

(Pier-Hugues Pellerin) #2

Hello @jro

From what I see Filebeat should send the complete structure to LS, so let's turn to the Logstash side to get a bit more details.

We can add the stdout to the outputs just before Elasticsearch in Logstash to know what the event look like at that point. You need to start Logstash from the CLI but this will give us the same output as the beats debug log.

stdout {
codec => rubydebug
}

Also, can you add the mapping of the index to this thread? You can retrieve it by running a command like this:

curl http://localhost:9200/myindex/_mapping?pretty

#3

Output of the mapping using http://localhost:9200/test-2018.06.29/_mapping?pretty

{
  "test-2018.06.29" : {
    "mappings" : {
      "doc" : {
        "properties" : {
          "@timestamp" : {
            "type" : "date"
          },
          "@version" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "beat" : {
            "properties" : {
              "hostname" : {
                "type" : "text",
                "fields" : {
                  "keyword" : {
                    "type" : "keyword",
                    "ignore_above" : 256
                  }
                }
              },
              "name" : {
                "type" : "text",
                "fields" : {
                  "keyword" : {
                    "type" : "keyword",
                    "ignore_above" : 256
                  }
                }
              },
              "version" : {
                "type" : "text",
                "fields" : {
                  "keyword" : {
                    "type" : "keyword",
                    "ignore_above" : 256
                  }
                }
              }
            }
          },
          "fields" : {
            "properties" : {
              "index" : {
                "type" : "text",
                "fields" : {
                  "keyword" : {
                    "type" : "keyword",
                    "ignore_above" : 256
                  }
                }
              }
            }
          },
          "host" : {
            "properties" : {
              "name" : {
                "type" : "text",
                "fields" : {
                  "keyword" : {
                    "type" : "keyword",
                    "ignore_above" : 256
                  }
                }
              }
            }
          },
          "input" : {
            "properties" : {
              "type" : {
                "type" : "text",
                "fields" : {
                  "keyword" : {
                    "type" : "keyword",
                    "ignore_above" : 256
                  }
                }
              }
            }
          },
          "message" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "messageType" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "offset" : {
            "type" : "long"
          },
          "priority" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "prospector" : {
            "properties" : {
              "type" : {
                "type" : "text",
                "fields" : {
                  "keyword" : {
                    "type" : "keyword",
                    "ignore_above" : 256
                  }
                }
              }
            }
          },
          "source" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "tags" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          }
        }
      }
    }
  }
}

#4

It seems that Logstash is sending the "message" as text to Elasticsearch, even though it is an object it receives from Filebeat, do you know why this is happening ?

Logstash output using stdout { codec => rubydebug }:

logstash_1       | {
logstash_1       |        "@version" => "1",
logstash_1       |          "source" => "C:\\logs\\test.log",
logstash_1       |            "host" => {
logstash_1       |         "name" => "myhost"
logstash_1       |     },
logstash_1       |          "offset" => 346,
logstash_1       |     "messageType" => "HTTP_RESPONSE",
logstash_1       |            "tags" => [
logstash_1       |         [0] "beats_input_codec_plain_applied"
logstash_1       |     ],
logstash_1       |      "@timestamp" => 2018-06-29T13:15:17.331Z,
logstash_1       |         "message" => "{\"type\"=>\"HTTP_RESPONSE\", \"name\"=>\"Create User\", \"status\"=>\"200\", \"reason\"=>\"OK\", \"response\"=>\"{id=1234, name=My Name}\"}",
logstash_1       |          "fields" => {
logstash_1       |         "index" => "test"
logstash_1       |     },
logstash_1       |            "beat" => {
logstash_1       |             "name" => "myhost",
logstash_1       |          "version" => "6.3.0",
logstash_1       |         "hostname" => "myhost"
logstash_1       |     },
logstash_1       |      "prospector" => {
logstash_1       |         "type" => "log"
logstash_1       |     },
logstash_1       |           "input" => {
logstash_1       |         "type" => "log"
logstash_1       |     },
logstash_1       |        "priority" => "INFO"
logstash_1       | }

#5

Here is also a Kibana screenshot of the "discover" page, showing only root level fields are indexed and searchable.


(Pier-Hugues Pellerin) #6

By default the logstash beats input assume that the message key is a string and will convert. We can see it by the added tag in the event.

[0] "beats_input_codec_plain_applied"

You can change the behavior by adding, the following options.

beats {
    port => 5000
    codec => json
    include_codec_tag => false
}

You need to adjust your mapping to support the object or delete the index, because of the dynamic mapping created with previously indexed data the index will only expect text and not objects.


#7

Added the 2 options to Logstash config (using a new index called test2).

Now I am getting a json parse error - basically the json string "response" is being parsed as an object, which it is not:

logstash_1 | [2018-06-29T14:19:42,219][ERROR][logstash.codecs.json ] JSON parse error, original data now in message field {:error=>#<LogStash::Json::ParserError: Unexpected character ('=' (code 61)): was expecting a colon to separate field name and value
logstash_1 | at [Source: (String)"{"type"=>"HTTP_RESPONSE", "name"=>"Create User", "status"=>"200", "reason"=>"OK", "response"=>"{id=1234, name=My Name}"}"; line: 1, column: 9]>, :data=>"{"type"=>"HTTP_RESPONSE", "name"=>"Create User", "status"=>"200", "reason"=>"OK", "response"=>"{id=1234, name=My Name}"}"}
logstash_1 | {
logstash_1 | "messageType" => "HTTP_RESPONSE",
logstash_1 | "@timestamp" => 2018-06-29T14:19:40.951Z,
logstash_1 | "input" => {
logstash_1 | "type" => "log"
logstash_1 | },
logstash_1 | "fields" => {
logstash_1 | "index" => "test2"
logstash_1 | },
logstash_1 | "priority" => "INFO",
logstash_1 | "message" => "{"type"=>"HTTP_RESPONSE", "name"=>"Create User", "status"=>"200", "reason"=>"OK", "response"=>"{id=1234, name=My Name}"}",
logstash_1 | "beat" => {
logstash_1 | "name" => "CBS08762",
logstash_1 | "version" => "6.3.0",
logstash_1 | "hostname" => "CBS08762"
logstash_1 | },
logstash_1 | "tags" => [
logstash_1 | [0] "_jsonparsefailure"
logstash_1 | ],
logstash_1 | "@version" => "1",
logstash_1 | "offset" => 519,
logstash_1 | "host" => {
logstash_1 | "name" => "CBS08762"
logstash_1 | },
logstash_1 | "source" => "C:\logs\test.log",
logstash_1 | "prospector" => {
logstash_1 | "type" => "log"
logstash_1 | }
logstash_1 | }


(Pier-Hugues Pellerin) #8

@jro Yes, So I think its a bug on our side, the "message" field has a special meaning for us, I think a workaround would be either:

  • Rename the 'message' field on the Filebeat side using the rename processor
  • Leave the JSON parsing at the beats input level.

#9

Thanks for the help @pierhugues, it works now with that workaround.

But should I file a bug regarding the "message" part being parsed different than other field names (can you point me to where this is happening) ?

I removed the 2 lines from the Logstash config: "codec => json" and "include_codec_tag => false".

I renamed "message" to "msg", and now the object is indexed and mapped correctly so I can use object fields: msg.name, msg.status etc.


(Pier-Hugues Pellerin) #10

Thanks for filling the bug @jro you can do that on the logstash-input-beats repositories, the code that in question should be this:


(system) #11

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.