Decide if nested JSON is JSON or text

I'm receiving JSON formatted events from Filebeat, parsed as JSON. I.e.
Original log:

{"date" : "2019...", "data" : { "subdata" : "sometext" }} (CASE1)
{"date" : "2019...", "data" : { "subdata" : { "field1" : "blah" } }} (CASE2)

Filebeat conf:

    - type: log
    ...
       json.keys_under_root:true
    ..

Generated log by FB:

{"@timestamp":"2019....", "@metadata" ...blahblah, "data":{"subdata":{"field1":"blah"}}...}

I need Logstash to be able to decide if "subdata" contains a text or a JSON object, e.g. ..."subdata" : { "field1" : "blah }

This is because Elasticsearch won't map JSON objects and texts into the same field (subdata is either text or object in ES).

I tried matching subdata to curly braces, but it didn't help:
E.g.
if [data][subdata] =~ /^{.*/ ----> this won't match { "data" : { "subdata" : { "field1....

I think part of the problem is that Filebeat sorts the original log message to separate fields, but I really don't feel like putting the burden on the few Logstash instances I have to JSON-ize every incoming message.

That would work if [data][subdata] is a string that starts with {

You could do it in ruby

    ruby {
        code => '
            s = event.get("[data][subdata]")
            if s
                if s.kind_of?(String)
                    isObject = false
                else
                    isObject = true
                end
                event.set("isObject", isObject)
            end
        '
    }
1 Like

Thank you, that works like a charm! I hope it scales well too, but nevertheless, this is a solution that hasn't just offered a quick help, but opened new ways I can fiddle with my LS configs! :slight_smile:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.