Logstash parse json child element, format and insert into elasticsearch

I have a json file like this:

  "fruits": {
    "fruit": [
      {
        "id": 1,
        "label": "test",
        "tag": "fine",
        "start": "4",
        "end": "9"
      },
      {
        "id": 2,
        "label": "test1",
        "tag": "fine1",
        "start": "2",
        "end": "4"
      }
    ]
  }
}

I have 100s of elements inside "fruit" field. I want to:

  • insert only the elements inside "fruit" field to the elasticsearch each as an individual doc. I want to use their own id as elasticsearch doc id.
  • calculate numbers in between "start" and "end" fields, then add those numbers as a comma separated string to a new field inside each doc.

The docs I want to insert into elasticsearch will be as follows:

{
    {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
            "id" : "1",
            "label": "test",
            "tag": "fine",
            "start": "4",
            "end": "9",
            "diffs": "4,5,6,7,8,9"
        }
    },
    {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
            "id" : "2",
            "label": "test1",
            "tag": "fine1",
            "start": "2",
            "end": "4",
            "diffs": "2,3,4"
        }
    }
}

Can anyone help me with the logstash configuration file to achieve the desired output? I am using ELK version 7.x

Thanks

Since diffs has multiple values that seems to imply that you are aggregating multiple entries from the [fruits][fruit] array that have the same value of id. If so, how do know from which entry the start and end values should be taken?

no aggregation actually. just need to get all the values within this range including "start" and "end"

OK, then I have no idea what you are trying to do.

well, did you get the other parts of the requirement? can you please help to do that?

  • insert only the elements inside "fruit" field to the elasticsearch each as an individual doc. I want to use their own id as elasticsearch doc id.

"how do know from which entry the start and end values should be taken?"

"start" and "end" values should be taken from each entry (document). like in the example: for id:1, "start" is 4 and "end" is 9. so "diff" will be all the values in between 4 and 9.. like increment by 1 from "start" till it reaches the "end".
@Badger

Use a split filter to create a new event for each entry in the array....

split { field => "[fruits][fruit]" }

then use mutate+add_field to move the items in the array entry to the top level using (or possibly use a ruby filter), then use mutate+remove_field to delete [fruits].

In the output section use a sprintf reference ("%{id}") for the value of the document_id option on the elasticsearch output.

@babuzrb To create [diffs] you could use this

    ruby {
        code => '
            d = ""
            for i in event.get("start").to_i .. event.get("end").to_i
                d += "#{i},"
            end

            event.set("diffs", d.chop)
        '
    }

I am sure there is some much prettier Ruby idiom that would work just as well

1 Like

@Badger can you please help by writing the configurations that you've suggested?

@Badger , I did that following your reference. All "fruit" list elements are now set to events according to a specific key. When I insert them they are inserted into the elasticsearch including the key under "_source". If I want only the event "value" to store under _source and ignore the event "key" what should I do?

@Badger Thank you very much. Following your instructions I could able to complete all requirements finally.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.