Dynamic template to convert all types to string

Hi there,

Consider the following logstash json filter snippet:

    json {
        skip_on_invalid_json => true
        source => "message_body"
        target => "jsondoc"
    }

which results on the following logstash event:

    {
      "@timestamp" => 2020-11-27T19:36:46.292Z,
        "jsondoc" => {
          "taskBegin" => true,
          "time" => "2020-09-22T02:39:33.255Z",
          "parent" => {
            "pid" => 8188,
            "taskid" => 34863,
            "name" => "PAS"
          },
          "v" => 0,
          "pid" => 23404,
          "taskid" => 157694,
          "gid" => "En5MFzJZ2JjgLl3HrIfNtA",
          "reqAccepted" => true,
          "req" => {
            "port" => 18681,
            "path" => "/v2/searchTasks/GK2kwu8gSE2fzI7dnUaICg/results?continueToken=&limit=100",
            "method" => "GET"
          },
          "level" => 30,
          "msg" => "",
          "name" => "LoadBalancer"
      }
    }

which is parsed by the logstash json plugin such that it contains values of different types ( booleans, integers, and strings).

Given the highly dynamic nature of this event, I'm running into all kinds of trouble when I attempt to output to elasticsearch, with many illegal argument exceptions:

[2020-11-27T19:11:09,952][WARN ][logstash.outputs.elasticsearch][main][77c3011a9ff8502709745eb35fc386b678d0aeb9876a3f25a01146fc5e950f11] Could not index event to Elasticsearch. {:status=>400, [...] "error"=>{"type"=>"illegal_argument_exception", "reason"=>"mapper [jsondoc.serverConfigurations.ccs.hashes.hashArray.hash] cannot be changed from type [long] to [float]"}}}}

I would like to ensure all values in the resulting json are converted to strings, so I don't run into this exceptions. It's not feasible to map these fields in advance because they will change depending on the input log which can serialise all sorts of json blobs.

I've been suggested in the logstash forum that handling this at the elasticsearch level with dynamic templates might be a more elegant solution than attempting to resolve in logstash using the ruby filter plugin.

In other words, how can I ensure all my values are strings upon processing or ingesting them into elasticsearch?

Desired example output below:

    {
      "@timestamp" => 2020-11-27T19:36:46.292Z,
        "jsondoc" => {
          "taskBegin" => "true",
          "time" => "2020-09-22T02:39:33.255Z",
          "parent" => {
            "pid" => "8188",
            "taskid" => "34863",
            "name" => "PAS"
          },
          "v" => "0",
          "pid" => "23404",
          "taskid" => "157694",
          "gid" => "En5MFzJZ2JjgLl3HrIfNtA",
          "reqAccepted" => "true",
          "req" => {
            "port" => "18681",
            "path" => "/v2/searchTasks/GK2kwu8gSE2fzI7dnUaICg/results?continueToken=&limit=100",
            "method" => "GET"
          },
          "level" => "30",
          "msg" => "",
          "name" => "LoadBalancer"
      }
    }

The implementation suggestion in elasticsearch was

"dynamic_templates": [
    {
        "everythingstring": { 
            "match" : "jsondoc*",
            "mapping" : { "type" : "text", "norms" : false }
    }
]

Is this viable? If so, how would I implement it?

Thanks in advance

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.