Issue with decode_json_fields processor

Hi,

Im shipping aws logs to elastic with the use of functionbeat.

I have recently added the following processor to my configuration, so that im able decode the json that is usually in the message field.

    processors:
      - decode_json_fields:
          fields: ["message"]
          process_array: false
          max_depth: 1
          target: ""
          overwrite_keys: false ## also tested with true
          add_error_key: true

However logs have stopped appearing since adding it.

example log:

{
  "_index": "functionbeat-7.9.2-2020.10.01-000001",
  "_type": "_doc",
  "_id": "LFom-HQBtUOF3QRT8N9L",
  "_version": 1,
  "_score": null,
  "_source": {
    "@timestamp": "2020-10-05T09:45:29.765Z",
    "owner": "<redacted>",
    "log_group": "/aws/lambda/discovery-production-twitch",
    "agent": {
      "id": "94593fd1-28ef-4dec-b105-6d34277b5466",
      "name": "169.254.173.37",
      "type": "functionbeat",
      "version": "7.9.2",
      "hostname": "169.254.173.37",
      "ephemeral_id": "926e9ba3-b05a-4015-9d70-1c746527505d"
    },
    "message": "{\"message\":{\"log_type\":\"ProcessTopStreamsByGame\",\"event\":\"ER_DUP_ENTRY\",\"payload\":{\"person_id\":\"588b38c3-bb8f-480f-927b-d39c351f3022\",\"game_id\":\"b1c0aa01-3420-4b63-b396-8f78dea14c96\"}},\"level\":\"info\",\"timestamp\":\"2020-10-05T09:45:29.765Z\"}\n",
    "log_stream": "2020/10/05/[$LATEST]583fc23d95c64c198505873a81d37ff8",
    "message_type": "DATA_MESSAGE",
    "subscription_filters": [
      "fnb-cloudwatch-stack-fnbcloudwatchSFawslambdadiscoveryproductiontwitch-OQDR276VW93N"
    ],
    "event": {
      "kind": "event"
    },
    "id": "35723365920675619124080672790578666189065578275236872390",
    "cloud": {
      "provider": "aws"
    },
    "ecs": {
      "version": "1.5.0"
    },
    "host": {
      "name": "169.254.173.37",
      "ip": [
        "169.254.76.1",
        "169.254.79.1",
        "169.254.80.2"
      ],
      "mac": [
        "7e:4d:06:88:d3:72",
        "66:87:d3:39:68:d4",
        "5a:86:f0:a2:54:61"
      ],
      "hostname": "169.254.173.37",
      "architecture": "x86_64",
      "os": {
        "version": "2018.03",
        "family": "redhat",
        "name": "Amazon Linux AMI",
        "kernel": "4.14.177-104.253.amzn2.x86_64",
        "platform": "amzn"
      },
      "containerized": true
    }
  },
  "fields": {
    "@timestamp": [
      "2020-10-05T09:45:29.765Z"
    ]
  },
  "highlight": {
    "log_group": [
      "/@kibana-highlighted-field@aws@/kibana-highlighted-field@/@kibana-highlighted-field@lambda@/kibana-highlighted-field@/@kibana-highlighted-field@discovery@/kibana-highlighted-field@-@kibana-highlighted-field@production@/kibana-highlighted-field@-@kibana-highlighted-field@twitch@/kibana-highlighted-field@"
    ]
  },
  "sort": [
    1601891129765
  ]
}

Hi,

I think the problem is a collision of fields. In ElasticSearch message is a simple text. Therefore, before adding the processor everything was imported. After adding the processor, message is an object with nested keys which collides with the field definition of text.

Have you tried setting targetto a non empty value? In this case the feilds are written to a nested key and should not collide with the existing field definitions.

Best regards
Wolfram

If its an issue with collision of keys shouldint overwrite_keys work in this case. Or is there a way to modify message to allow it to be more than a text? Either text or json object

ideally im looking for a final result where message, level and timestamp are in the root of the document.

@Wolfram_Haussig

You're right changing target to an actual value made the logs appear.

But now im wondering how do i get some_data.level and some_data.timestamp to be in the root

There is a way: The Mapping of fields in ElasticSearch are either generated on the fly when unknown fields are ingested or it is defined in Index templates. In the first case it should be enough to delete the old index and create a new one. In the case of an Index Template you have to update the index template mapping.

You could try to add a copy_fields processor to copy relevant data from the nested tree to the root.

Ok, im not sure what the best practice is.

Im ingesting logs from aws (functionbeat) and kubernetes (filebeat) into elasticsearch.

They nearly always have a message field. Sometimes it's a string, sometimes its a json.

Sometimes the message field is a custom log like in the example above from a node.js api application ive written.

If i set target to be acme the name of my company that makes sense for the logs that were generated by my own applications.

But it does not make sense to call the field acme for logs that were generated from prebuilt apps/services created by other companies such as mongo/redis etc etc.

I think the best way would be to parse the data into a separate tree - it does not have to be the name of your company, e.g. parsed_message.
In ElasticSearch you could then create an ingest pipeline and then move the fields to a standardized location. In the ECS schema many fields are already defined so I recommend to follow their schema and to only add fields if necessary. This has the benefit that independently from the data source documents can be aggregated because the same content is stored in fields with the same name.
Example:
payload.user_id => user.id
payload.user_name => user.name
response.StatusCode => http.response.status_code
...

OK that makes sense.

Finally, if i decide i want to make message field more than a string so it can be a string or a json...Is there someway to modify this in the functionbeat configuration file rather than updating the index template?

Functionbeat is the one which creates the index template i assume.

Oh i have just seen their does not appear to be a dynamic field type