Parsing nested json

I am having issues with our AWS ecs -> fluentbit -> elasticsearch set up, specifically around nested json.

For example, if the log message is:

{
  "endpoint": "/process",
  "payload": {
     "body": {
       "success": "true",
       "items": [
          {"name": "item_one"},
          {"name": "item_two"}
       ]   
     }
  }
}

We would like the following fields to be parsed:

endpoint -> "process"
payload -> {"body": {"success": "true", "items": {"name": "item_one"}, {"name": "item_two"}]}

Only the top level key.

We set "index.mapping.depth.limit": 1 but this resulted in the logs being rejected by elasticsearch

    "status":400,
    "error":{
        "type": "illegal_argument_exception",
        "reason": "Limit of mapping depth [1] has been exceeded due to object field [org]"
    }

Is there a setting that will parse only the top level but accept the rest of the data as the body?

Hey,

I'll try my best but you might want to wait for better answers :smiley:

I do believe it's a mapping issue, here elestic is rejecting because the setting is not defined ?

You could also try to set the fields a keyword but it would be a static per field definition which can be tedious in case you have mixed data with a lot of fields.

Also there is a lot of ruby scripts around here if you want to extract the subkeys.

I appreciate the help!

So you are saying, I should try get the parsing done correctly on the fluentbit level and not elasticsearch level?

No you'll have to define a static type for the incoming field on the elastic side so that the indexed field values are of type text Field data types | Elasticsearch Guide [8.17] | Elastic

Got it thanks.

We were hoping of not having to maintain the mappings on the elastic level but perhaps we'll have to.

Would it be possible to parse it at the fluentbit level so that the second level is a string instead of a object? Just trying to work out if there is a best practise for us to follow

In the end we sorted it out at the pipeline level

PUT /_ingest/pipeline/main_pipeline
{
  
  "description" : "process-pipeline ",
  "processors" : [
      {
        "date" : {
          "field" : "timestamp",
          "formats" : ["ISO8601"],
          "ignore_failure" : true
        }
      }, 
      {
      "script": {
        "source": """
          for (entry in ctx.entrySet()) {
            if (entry.getValue() instanceof Map) {
              ctx[entry.getKey()] = entry.getValue().toString();
            }
          }
        """
      }
    }
  ]
}

Not sure if it's best practise but it worked

1 Like

Yes ! That's the script i was thinking about you solved that from the parsing side.

I suggest you also check data types of your mapping ( which i believe is automatic ) So that you understand the type must match and can be conflicting or sometimes prevent the document from being ingested correctly by elastic.