Help, I can't figure out a basic KQL query for my data

I have CloudWatch logs being sent to elasticsearch from functionbeat. There is a filed called "message" that contains all the information I want to perform visualizations on. I've spent an entire day trying to figure out how to use your KQL method to practice parsing this "message" field and I've failed miserably on every level.

As you'll see below I think part of the problem is everything inside this "message" field isn't being treated as JSON but instead a single string. This is really complicating matters. I'm going to need guidance and you're going to have to explain it to me like i'm 5. Perhaps the answer is to create a ingest pipeline but I've no idea what that would look like. The answer could easily be a complex KQL query too.

I'm removing junk data out of this document that isn't important indicated by the characters ...

{
  "_index": "functionbeat-7.7.0-2020.05.23-000001",
  "_type": "_doc",
  "_id": "tuv6P3IBbkn-TM6wEe-z",
  "_version": 1,
  "_score": null,
  "_source": {
    "@timestamp": "2020-05-23T05:20:47.689Z",
    ...
    "message": "{\"id\":\"4\",\"type\":\"PassStateEntered\",\"details\":{\"input\":\"\\\"starting state1\\\"\",\"name\":\"state2\"},\"previous_event_id\":\"3\",\"event_timestamp\":\"1590211247689\",\"execution_arn\":\"arn:aws-us-gov:states:us-gov-west-1:097135049942:execution:syost-step-workflow-a:e3644418-d5a4-e182-6f96-1b5db9de714f\"}",
    "message_type": "DATA_MESSAGE",
  ...
}

I think the best approach for your case would be to parse the message which seems to contain JSON at ingestion Time.

This can be done using Logstash using the JSON filter or an ingest pipeline.

1 Like

@Luca_Belluccini
Thank Luca ~ I take it the Grok processor might be of interest to me then? I ask cause there is a json processor as well. My confusion is that the json object that is in my cloudwatch logs is somewhere getting converted to a json string it seems. I assume this is happening in functionbeat or elasticsearch. Therefore, I'm confused if I should use the grok or json processor in my ingest pipeline. If I use a json processor then this may not work since functionbeat treats it as a json string. Do you see where my confusion is?

You need the json filter on the field which contains the stringified json

@Luca_Belluccini

That worked perfect! However, I noticed that if the stringified json has several nested fields then those nested fields do not get converted to a json object. Anyways around that? Is this where a mapping schema would come in?

Hello @syost
If the JSON has other stringified fields inside it, you'll need to apply other json filters to it.

If after the first JSON filter you have:

{
  "a": {
    "subfield": "{ \"test\": 1 }"
  }
}

You will need to apply another json filter with source => "[a][subfield]" and the target will need to be target => "[a][subfield]", so you will end up with:

{
  "a": {
    "subfield": {
      "test": 1
    }
  }
}