Script processor ingest pipelines on nested fields

(Fabio) #1

Hi there!

I'm dealing with the following situation.
I have some data ingested into elasticsearch which present a field "certainField" which can either be null or contain an unspecified number of JSON objects.

My goal is to make an average of all the occurrences of a nested field present in any of those JSON object and write this average as the value of a first level field of the document.

To make an example I have something like:

@timestamp: ...
name: ...
certainField: [
{
nestedField1: ...
nestedField2: ...
interestingNestedField: 35
...
},
{
nestedField1: ...
nestedField2: ...
interestingNestedField: 42
...
},
{
nestedField1: ...
nestedField2: ...
interestingNestedField: 12
...
}
]

I would like to use an ingest pipeline so as to have another first level field interestingFieldAverage: 29,66 like the following:

@timestamp: ...
name: ...
interestingFieldAverage: 29,66
certainField: [
{
nestedField1: ...
nestedField2: ...
interestingNestedField: 35
...
},
{
nestedField1: ...
nestedField2: ...
interestingNestedField: 42
...
},
{
nestedField1: ...
nestedField2: ...
interestingNestedField: 12
...
}
]

Is there any way to accomplish such a task? Obsiously I'd need a way to iterate over the nested JSON objects without knowing in advance their number (as said it varies from document to document).

Thank you!

(Gordon Brown) #2

The lists of objects will simply be Painless lists in the script, so you can iterate over them just like any other list. This case is especially straightforward using Streams, e.g.:

POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "processors": [
      {
        "script": {
          "source": """
          ctx.interesting_avg = ctx.things.stream()
            .mapToInt(thing -> thing.interesting)
            .average()
            .orElse(0)
          """
        }
      }
    ]
  },
  "docs": [
    {
      "_source": {
        "things": [
          {
            "interesting": 35
          },
          {
            "interesting": 42
          },
          {
            "interesting": 12
          }
        ]
      }
    }
  ]
}

Which returns:

{
  "docs" : [
    {
      "doc" : {
        "_index" : "_index",
        "_type" : "_type",
        "_id" : "_id",
        "_source" : {
          "things" : [
            {
              "interesting" : 35
            },
            {
              "interesting" : 42
            },
            {
              "interesting" : 12
            }
          ],
          "interesting_avg" : 29.666666666666668
        },
        "_ingest" : {
          "timestamp" : "2019-03-13T22:40:27.030Z"
        }
      }
    }
  ]
}

Does that help you?

(Fabio) #3

First of all thank you so much for your kind answer.

Chances are you solved my problem but I cannot confirm it right now since I have an important call in days and I cannot risk messing my indexes up.

I'll let you know as soon as I'm done with the presentation.

Thank you again!

(system) closed #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.