Logstash parse json child element, format and insert into elasticsearch

babuzrb · August 16, 2022, 11:05pm

I have a json file like this:

  "fruits": {
    "fruit": [
      {
        "id": 1,
        "label": "test",
        "tag": "fine",
        "start": "4",
        "end": "9"
      },
      {
        "id": 2,
        "label": "test1",
        "tag": "fine1",
        "start": "2",
        "end": "4"
      }
    ]
  }
}

I have 100s of elements inside "fruit" field. I want to:

insert only the elements inside "fruit" field to the elasticsearch each as an individual doc. I want to use their own id as elasticsearch doc id.
calculate numbers in between "start" and "end" fields, then add those numbers as a comma separated string to a new field inside each doc.

The docs I want to insert into elasticsearch will be as follows:

{
    {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
            "id" : "1",
            "label": "test",
            "tag": "fine",
            "start": "4",
            "end": "9",
            "diffs": "4,5,6,7,8,9"
        }
    },
    {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
            "id" : "2",
            "label": "test1",
            "tag": "fine1",
            "start": "2",
            "end": "4",
            "diffs": "2,3,4"
        }
    }
}

Can anyone help me with the logstash configuration file to achieve the desired output? I am using ELK version 7.x

Thanks

Badger · August 17, 2022, 12:00am

Since diffs has multiple values that seems to imply that you are aggregating multiple entries from the [fruits][fruit] array that have the same value of id. If so, how do know from which entry the start and end values should be taken?

babuzrb · August 17, 2022, 12:02am

no aggregation actually. just need to get all the values within this range including "start" and "end"

Badger · August 17, 2022, 12:03am

OK, then I have no idea what you are trying to do.

babuzrb · August 17, 2022, 12:06am

well, did you get the other parts of the requirement? can you please help to do that?

insert only the elements inside "fruit" field to the elasticsearch each as an individual doc. I want to use their own id as elasticsearch doc id.

babuzrb · August 17, 2022, 12:15am

"how do know from which entry the start and end values should be taken?"

"start" and "end" values should be taken from each entry (document). like in the example: for id:1, "start" is 4 and "end" is 9. so "diff" will be all the values in between 4 and 9.. like increment by 1 from "start" till it reaches the "end".
@Badger

Badger · August 17, 2022, 12:17am

Use a split filter to create a new event for each entry in the array....

split { field => "[fruits][fruit]" }

then use mutate+add_field to move the items in the array entry to the top level using (or possibly use a ruby filter), then use mutate+remove_field to delete [fruits].

In the output section use a sprintf reference ("%{id}") for the value of the document_id option on the elasticsearch output.

@babuzrb To create [diffs] you could use this

    ruby {
        code => '
            d = ""
            for i in event.get("start").to_i .. event.get("end").to_i
                d += "#{i},"
            end

            event.set("diffs", d.chop)
        '
    }

I am sure there is some much prettier Ruby idiom that would work just as well

babuzrb · August 17, 2022, 12:39am

@Badger can you please help by writing the configurations that you've suggested?

babuzrb · August 17, 2022, 3:04am

@Badger , I did that following your reference. All "fruit" list elements are now set to events according to a specific key. When I insert them they are inserted into the elasticsearch including the key under "_source". If I want only the event "value" to store under _source and ignore the event "key" what should I do?

babuzrb · August 17, 2022, 2:46pm

@Badger Thank you very much. Following your instructions I could able to complete all requirements finally.

system · September 14, 2022, 2:47pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Process array elements as objects Logstash	3	780	January 4, 2018
Logstash how to parse and split nested json file Logstash	10	1854	June 20, 2022
Getting Json Formated Data to elastic in multiple Docs Logstash	3	239	May 5, 2020
Separate Documents from log (JSON) Logstash	0	13	October 21, 2024
Flatten JSON Array in Logstash Filter Logstash	3	10389	April 16, 2018

Logstash parse json child element, format and insert into elasticsearch

Related topics