Limit of total fields [1000] in index exceeded and nested json

Hi All, I need some help to avoid the well known 'Limit of total fields [1000] in index exceeded' problem. I don't consider to raise the default in the settings above 1000.
The culprit seems to be a field with a json value that contains simple key-values as well as nested structures.
The first problem is that messages with this field are shipped by different applications and only few keys inside that field are in common between all application. Other keys are unique to each application, and I don't know all of them. They even might change in the future, so hardcoding them in a mapping is not a practical option.
Second problem I found, some of the keys inside that field have json object as value, some others have json strings, some others array of json.
See a mock of what I see in the JSON tab in Kibana:

    {
      "_index": "my-index-01",
      "_type": "doc",
      "_id": "QJTIJXQBPFU8jmrx9tmo",
      "_version": 1,
      "_score": null,
      "_source": {
        "level": "info",
        "tags": [
          "winston"
        ],
    {
      "_index": "my-index-01",
      "_type": "doc",
      "_id": "QJTIJXQBPFU8jmrx9tmo",
      "_version": 1,
      "_score": null,
      "_source": {
        "level": "info",
        "tags": [
          "winston"
        ],
        "host": "ip-10-0-185-143.eu-west-1.compute.internal",
        "type": "tcp",
        . . .
        . . .
        "context": {
          "userType": "admin",
          "userDept": "admins",          
          "example": {
            "user": {
              "johndoe": "{\"name\":\"John\",\"surname\":\"Doe\"}"
              "isEmpty": false,
              "initDate": 1629806366409
            },
            "position": {
              "geo": {
                "x": "1234567890",
                "y": "0987654321"
              }
            },
            "textEntries": [
              {
                "key": "hasCredentials",
                "value": "false"
              },
              {
                "key": "isInternal",
                "value": "false"
              },
              {
                "key": "isMember",
                "value": "true"
              }
            ]
          }
        }
      }
    }

We use dynamic indexing, so I suspect that whenever one of those messages contains a "context" field that in turn contains very verbose nested values, we reach the 1000 limit and the message is not ingested.
Is there a practical way to keep indexing the simple key-values inside "context" (like userType and userDepts), but avoid indexing the values of the keys when such values are strigified json or other nested structures ("example", "position", "textEntries", etc) ?

If you want to end up with

   "context" => {
    "userDept" => "admins",
    "userType" => "admin"
},

then use

ruby {
    code => '
        ctx = event.get("context")
        if ctx
            event.set("context", ctx.delete_if { |k, v| v.is_a?(Hash) or v.is_a?(Array) } )
        end
    '
}

If you want to end up with

   "context" => {
    "userDept" => "admins",
    "userType" => "admin",
     "example" => "{\"position\"=>{\"geo\"=>{\"x\"=>\"1234567890\", \"y\"=>\"0987654321\"}}, \"textEntries\"=>[{\"value\"=>\"false\", \"key\"=>\"hasCredentials\"}, {\"value\"=>\"false\", \"key\"=>\"isInternal\"}, {\"value\"=>\"true\", \"key\"=>\"isMember\"}], \"user\"=>{\"isEmpty\"=>false, \"johndoe\"=>\"{\\\"name\\\":\\\"John\\\",\\\"surname\\\":\\\"Doe\\\"}\", \"initDate\"=>1629806366409}}"
},

then use

ruby {
    code => '
        ctx = event.get("context")
        if ctx
            ctx.each { |k, v|
                if v.is_a?(Hash) or v.is_a?(Array)
                    ctx[k] = v.to_s
                end
            }
            event.set("context", ctx)
        end
    '
}

Thanks @Badger, the second option was what I was looking for !

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.