Ingest processor failure when accessing nested field


#1

Hello,
I'm facing the following problem: accessing nested field within ingest processor fails with an error. Simple example on how to reproduce the problem:

  1. Create 'nest_test' index with mapping containing nested field:

    curl -X PUT "localhost:9200/nest_test" -H 'Content-Type: application/json' -d'
    {
    "mappings":{
    "resources":{
    "properties":{
    "nest":{
    "type":"nested",
    "dynamic":"strict",
    "properties":{
    "field_raw":{ "type":"keyword" },
    "field_lower":{ "type":"keyword" }
    }
    }
    }
    }
    }
    }
    '

  2. Create 'lowercase' pipeline:

    curl -X PUT "localhost:9200/_ingest/pipeline/nested_lowercase" -H 'Content-Type: application/json' -d'
    {
    "description" : "failure reproducing",
    "processors" : [
    {
    "lowercase" : {
    "field": "nest.field_raw",
    "target_field": "nest.field_lower"
    }
    }
    ]
    }
    '

  3. Index some data involving pipeline:

    curl -X POST "localhost:9200/nest_test/1?pretty&refresh&pipeline=nested_lowercase" -H 'Content-Type: application/json' -d'
    {"nest": [{"field_raw": "Test Me"}]}
    '

    The result is:

    {
    "error" : {
    "root_cause" : [
    {
    "type" : "exception",
    "reason" : "java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: [field_raw] is not an integer, cannot be used as an index as part of path [nest.field_raw]",
    "header" : {
    "processor_type" : "lowercase"
    }
    }
    ],
    "type" : "exception",
    "reason" : "java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: [field_raw] is not an integer, cannot be used as an index as part of path [nest.field_raw]",
    "caused_by" : {
    "type" : "illegal_argument_exception",
    "reason" : "java.lang.IllegalArgumentException: [field_raw] is not an integer, cannot be used as an index as part of path [nest.field_raw]",
    "caused_by" : {
    "type" : "illegal_argument_exception",
    "reason" : "[field_raw] is not an integer, cannot be used as an index as part of path [nest.field_raw]",
    "caused_by" : {
    "type" : "number_format_exception",
    "reason" : "For input string: "field_raw""
    }
    }
    },
    "header" : {
    "processor_type" : "lowercase"
    }
    },
    "status" : 500
    }

There similar topic with the same problem, but different processors, but no answers anywhere :frowning:

https://discuss.elastic.co/t/nested-field-not-getting-processed-by-ingest-processor-plugin/96136
https://discuss.elastic.co/t/nested-documents-in-ingest-pipeline-processor/135305
https://github.com/elastic/elasticsearch/issues/22193

Is there a way to access and work with nested fields from processors?

Thanks in advance for your help,
Mike


(Ryan Ernst) #2

I think you will need to use a script processor. None of the processors know how to deal with a nested field (which appears as an array of sub-maps). Eg, for doing lowercase with your example:

"processors" : [
    {
        "script" : {
            "source": "for (Map nested : ctx._source['nest']) { nested['field_lower'] = nested['field_raw'].toLowerCase(Locale.ROOT); }"
        }
    }
]

(Tal Levy) #3

Another way to achieve this is with the help of the foreach processor.

here is an example in Console

PUT _ingest/pipeline/nest_lowercase
{
  "description": "lowercase field_raw",
  "processors": [
    {
      "foreach": {
        "field": "nest",
        "processor": {
          "lowercase": {
            "field": "_ingest._value.field_raw",
            "target_field": "_ingest._value.field_lower"
          }
        }
      }
    }]
}


PUT nest_test/_doc/1?pipeline=nest_lowercase
{
  "nest": [{"field_raw": "CamelCase"}]
}

GET nest_test/_doc/1

The document that was indexed, will look like this:

{
  "_index": "nest_test",
  "_type": "_doc",
  "_id": "1",
  "_version": 1,
  "found": true,
  "_source": {
    "nest": [
      {
        "field_lower": "camelcase",
        "field_raw": "CamelCase"
      }
    ]
  }
}

Since foreach only accepts one processor at a time to operate on the array elements, you may find it more customizable to use Painless scripting for additional transformations


#4

That solution raises up the following error:

   "processors" : [
     {
       "script" : {
         "source": "for (Map nested : ctx._source['nest']) { nested['field_lower'] = nested['field_raw'].toLowerCase(Locale.ROOT); }"
       }
     }
   ]

{
  "error" : {
    "root_cause" : [
      {
        "type" : "script_exception",
        "reason" : "compile error",
        "script_stack" : [
          "... Map nested : ctx._source[nest]) { nested[field_low ...",
          "                             ^---- HERE"
        ],
        "script" : "for (Map nested : ctx._source[nest]) { nested[field_lower] = nested[field_raw].toLowerCase(Locale.ROOT); }",
        "lang" : "painless",
        "header" : {
          "processor_type" : "script",
          "property_name" : "source"
        }
      }
    ],
    "type" : "script_exception",
    "reason" : "compile error",
    "script_stack" : [
      "... Map nested : ctx._source[nest]) { nested[field_low ...",
      "                             ^---- HERE"
    ],
    "script" : "for (Map nested : ctx._source[nest]) { nested[field_lower] = nested[field_raw].toLowerCase(Locale.ROOT); }",
    "lang" : "painless",
    "caused_by" : {
      "type" : "illegal_argument_exception",
      "reason" : "Variable [nest] is not defined."
    },
    "header" : {
      "processor_type" : "script",
      "property_name" : "source"
    }
  },
  "status" : 500
}

I've tried to rewrite it to:

  "processors" : [
    {
      "script" : {
        "source": "for (Map nested : ctx._source.nest) { nested.field_lower = nested.field_raw.toLowerCase(Locale.ROOT); }"
      }
    }
  ]

and pipeline creation was done without errors, but indexing the document failed:

curl -X POST "localhost:9200/nest_test/1?pretty&refresh&pipeline=nested_lowercase" -H 'Content-Type: application/json' -d'
{"nest": [{"field_raw": "Test Me"}]}
'

{
  "error" : {
    "root_cause" : [
      {
        "type" : "exception",
        "reason" : "java.lang.IllegalArgumentException: ScriptException[runtime error]; nested: NullPointerException;",
        "header" : {
          "processor_type" : "script"
        }
      }
    ],
    "type" : "exception",
    "reason" : "java.lang.IllegalArgumentException: ScriptException[runtime error]; nested: NullPointerException;",
    "caused_by" : {
      "type" : "illegal_argument_exception",
      "reason" : "ScriptException[runtime error]; nested: NullPointerException;",
      "caused_by" : {
        "type" : "script_exception",
        "reason" : "runtime error",
        "script_stack" : [
          "for (Map nested : ctx._source.nest) { ",
          "                             ^---- HERE"
        ],
        "script" : "for (Map nested : ctx._source.nest) { nested.field_lower = nested.field_raw.toLowerCase(Locale.ROOT); }",
        "lang" : "painless",
        "caused_by" : {
          "type" : "null_pointer_exception",
          "reason" : null
        }
      }
    },
    "header" : {
      "processor_type" : "script"
    }
  },
  "status" : 500
}

And other question is, if we'll succeed with lowercase, how can i replace JSON processor with scripting?


#5

Nope, that doesn't work for me too:

curl -X PUT "localhost:9200/_ingest/pipeline/nested_lowercase?pretty" -H 'Content-Type: application/json' -d'
{
  "description" : "failure reproducing",
  "processors" : [
    {
      "foreach": {
        "field": "nest",
        "processor": {
          "lowercase": {
            "field": "_ingest._value.field_raw",
            "target_field": "_ingest._value.field_lower"
          }
        }
      }
    }]
}
'

curl -X PUT "localhost:9200/nest_test/_doc/1?pretty&refresh&pipeline=nested_lowercase" -H 'Content-Type: application/json' -d'
{"nest": [{"field_raw": "CamelCase"}]}
'

{
  "error" : {
    "root_cause" : [
      {
        "type" : "illegal_argument_exception",
        "reason" : "object mapping [nest] can't be changed from nested to non-nested"
      }
    ],
    "type" : "illegal_argument_exception",
    "reason" : "object mapping [nest] can't be changed from nested to non-nested"
  },
  "status" : 400
}

I've changed POST method to PUT to have the same c=operations as yours, plus I've recreated the index without setting "dynamic":"strict" to the nested field, but all that didn't help.


(Tal Levy) #6

This error leads me to believe there is a difference between the mapping you are running against, and the one I ran against. Since this mapping exception is due to actual indexing, this means that the ingest pipeline was successful in executing and this is a mismatch between the object being indexed and the index mapping.

would you mind sharing your response to GET nest_test, specifically the mappings?


#7

It worked! The problem was with the type, that i missed while sending data to index.
Incorrect in my case:

curl -X POST "localhost:9200/nest_test/_doc/1?pretty&refresh&pipeline=nested_lowercase" -H 'Content-Type: application/json' -d'
{"nest": [{"field_raw": "CamelCase"}]}
'

Correct:

curl -X POST "localhost:9200/nest_test/resources/1?pretty&refresh&pipeline=nested_lowercase" -H 'Content-Type: application/json' -d'
{"nest": [{"field_raw": "CamelCase"}]}
'

@talevy thank you very much!

I'll check all my variations tomorrow to check if foreach processor covers them all:

  • applying two different processors to the same field, for example uppercase and lowercase, to store the result in two new fields
  • accessing nested objects of nested object

#8

It's OK with accessing nested object of nested object, but as you mentioned before there is no possibility to use foreach to apply more than one processor, thus that solution is not suitable for me.

Now, the only hope is to somehow make script processor work with nested objects.

I'll keep example of pipeline accessing nested-nested object using foreach Just in case someone will find it suitable:

  "processors" : [
    {
      "foreach": {
        "field": "nest",
        "processor": {
          "foreach" : {
            "field": "_ingest._value.inner_nest",
            "processor" : {
              "lowercase": {
                "field": "_ingest._value.inner_field_raw",
                "target_field": "_ingest._value.inner_field_lower"
              }
            }
          }
        }
      }
    }
  ]

and the corresponding document:
{"nest": [{"field_raw" : "Test Me", "inner_nest" : [{"inner_field_raw": "Test Me Inner"}]}]}


#9

Just in case someone will need that, the correct solution will be to refer to the object using ctx.field_name, but not ctx._source['field_name'] or ctx._source.nest. Thus, the correct script will look like:

  "processors" : [
    {
      "script" : {
        "source": "for (item in ctx.nest) { item.field_lower = item.field_raw.toLowerCase(Locale.ROOT); }"
      }
    }
  ]

My problem is still there, as I need not only lowercase, but also JSON processor, which seems to be a little more complicated.

Anyway,
@rjernst , @talevy thank you very much for the ideas!


(system) #10

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.