Issue with decode_json_fields processor

Kay_Khan · October 5, 2020, 9:56am

Hi,

Im shipping aws logs to elastic with the use of functionbeat.

I have recently added the following processor to my configuration, so that im able decode the json that is usually in the message field.

    processors:
      - decode_json_fields:
          fields: ["message"]
          process_array: false
          max_depth: 1
          target: ""
          overwrite_keys: false ## also tested with true
          add_error_key: true

However logs have stopped appearing since adding it.

example log:

{
  "_index": "functionbeat-7.9.2-2020.10.01-000001",
  "_type": "_doc",
  "_id": "LFom-HQBtUOF3QRT8N9L",
  "_version": 1,
  "_score": null,
  "_source": {
    "@timestamp": "2020-10-05T09:45:29.765Z",
    "owner": "<redacted>",
    "log_group": "/aws/lambda/discovery-production-twitch",
    "agent": {
      "id": "94593fd1-28ef-4dec-b105-6d34277b5466",
      "name": "169.254.173.37",
      "type": "functionbeat",
      "version": "7.9.2",
      "hostname": "169.254.173.37",
      "ephemeral_id": "926e9ba3-b05a-4015-9d70-1c746527505d"
    },
    "message": "{\"message\":{\"log_type\":\"ProcessTopStreamsByGame\",\"event\":\"ER_DUP_ENTRY\",\"payload\":{\"person_id\":\"588b38c3-bb8f-480f-927b-d39c351f3022\",\"game_id\":\"b1c0aa01-3420-4b63-b396-8f78dea14c96\"}},\"level\":\"info\",\"timestamp\":\"2020-10-05T09:45:29.765Z\"}\n",
    "log_stream": "2020/10/05/[$LATEST]583fc23d95c64c198505873a81d37ff8",
    "message_type": "DATA_MESSAGE",
    "subscription_filters": [
      "fnb-cloudwatch-stack-fnbcloudwatchSFawslambdadiscoveryproductiontwitch-OQDR276VW93N"
    ],
    "event": {
      "kind": "event"
    },
    "id": "35723365920675619124080672790578666189065578275236872390",
    "cloud": {
      "provider": "aws"
    },
    "ecs": {
      "version": "1.5.0"
    },
    "host": {
      "name": "169.254.173.37",
      "ip": [
        "169.254.76.1",
        "169.254.79.1",
        "169.254.80.2"
      ],
      "mac": [
        "7e:4d:06:88:d3:72",
        "66:87:d3:39:68:d4",
        "5a:86:f0:a2:54:61"
      ],
      "hostname": "169.254.173.37",
      "architecture": "x86_64",
      "os": {
        "version": "2018.03",
        "family": "redhat",
        "name": "Amazon Linux AMI",
        "kernel": "4.14.177-104.253.amzn2.x86_64",
        "platform": "amzn"
      },
      "containerized": true
    }
  },
  "fields": {
    "@timestamp": [
      "2020-10-05T09:45:29.765Z"
    ]
  },
  "highlight": {
    "log_group": [
      "/@kibana-highlighted-field@aws@/kibana-highlighted-field@/@kibana-highlighted-field@lambda@/kibana-highlighted-field@/@kibana-highlighted-field@discovery@/kibana-highlighted-field@-@kibana-highlighted-field@production@/kibana-highlighted-field@-@kibana-highlighted-field@twitch@/kibana-highlighted-field@"
    ]
  },
  "sort": [
    1601891129765
  ]
}

Wolfram_Haussig · October 5, 2020, 10:50am

Hi,

I think the problem is a collision of fields. In ElasticSearch message is a simple text. Therefore, before adding the processor everything was imported. After adding the processor, message is an object with nested keys which collides with the field definition of text.

Have you tried setting targetto a non empty value? In this case the feilds are written to a nested key and should not collide with the existing field definitions.

Best regards
Wolfram

Kay_Khan · October 5, 2020, 10:55am

If its an issue with collision of keys shouldint overwrite_keys work in this case. Or is there a way to modify message to allow it to be more than a text? Either text or json object

ideally im looking for a final result where message, level and timestamp are in the root of the document.

Kay_Khan · October 5, 2020, 11:03am

@Wolfram_Haussig

You're right changing target to an actual value made the logs appear.

But now im wondering how do i get some_data.level and some_data.timestamp to be in the root

Wolfram_Haussig · October 5, 2020, 11:11am

There is a way: The Mapping of fields in Elasticsearch are either generated on the fly when unknown fields are ingested or it is defined in Index templates. In the first case it should be enough to delete the old index and create a new one. In the case of an Index Template you have to update the index template mapping.

You could try to add a copy_fields processor to copy relevant data from the nested tree to the root.

Kay_Khan · October 5, 2020, 11:28am

Ok, im not sure what the best practice is.

Im ingesting logs from aws (functionbeat) and kubernetes (filebeat) into elasticsearch.

They nearly always have a message field. Sometimes it's a string, sometimes its a json.

Sometimes the message field is a custom log like in the example above from a node.js api application ive written.

If i set target to be acme the name of my company that makes sense for the logs that were generated by my own applications.

But it does not make sense to call the field acme for logs that were generated from prebuilt apps/services created by other companies such as mongo/redis etc etc.

Wolfram_Haussig · October 5, 2020, 11:37am

I think the best way would be to parse the data into a separate tree - it does not have to be the name of your company, e.g. parsed_message.
In ElasticSearch you could then create an ingest pipeline and then move the fields to a standardized location. In the ECS schema many fields are already defined so I recommend to follow their schema and to only add fields if necessary. This has the benefit that independently from the data source documents can be aggregated because the same content is stored in fields with the same name.
Example:
payload.user_id => user.id
payload.user_name => user.name
response.StatusCode => http.response.status_code
...

Kay_Khan · October 5, 2020, 11:48am

OK that makes sense.

Finally, if i decide i want to make message field more than a string so it can be a string or a json...Is there someway to modify this in the functionbeat configuration file rather than updating the index template?

Functionbeat is the one which creates the index template i assume.

Kay_Khan · October 5, 2020, 11:59am

Oh i have just seen their does not appear to be a dynamic field type

system · November 2, 2020, 1:59pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Problem with decode_json_fields Beats filebeat	5	508	November 6, 2020
Decode_json_fields and array Beats filebeat	3	1588	November 25, 2019
Decode_json_fields with message field Beats functionbeat	3	974	September 7, 2020
Filebeat decode_json_fields isn't parssing arrays Beats filebeat	3	660	November 26, 2019
Elastic agent filebeat processors' decode_json_fields error Elastic Agent filebeat	1	329	September 27, 2022

Issue with decode_json_fields processor

Related topics