Unable to access nested fields in pipeline processors

See Stackoverflow: http://stackoverflow.com/questions/42393543/elasticsearch-5-2-pipeline-can-not-access-nested-fields

I need to access nested fields in pipeline processors. It should be a frequent use case, however, I can not figure it out.

My mapping:

  "properties" : {
    "attachment" : {
      "type" : "object",
      "properties" : {
        "content" : {"type" : "text", "store" : true, "index" : true, "analyzer" : "german" },
        "title" : {"type" : "text", "store" : true},
        "date" : {"type" : "date", "store" : true},
        "author" : {"type" : "text", "store" : true},
        "keywords" : {"type" : "keyword", "store" : true},
        "content_type" : {"type" : "keyword", "store" : true},
        "content_length" : {"type" : "integer", "store" : true},
        "language" : {"type" : "keyword", "store" : true}
      }
    }
  }

Pipeline configuration:

  "description" : "Extract attachment information",
  "processors" : [
    {
      "attachment" : {
        "field" : "data",
        "indexed_chars" : -1
      }
    },
    {
        "remove": {
        "field": "data"
      }
    },
    {
      "split": {
        "field" : "attachment.keywords",
        "separator" : "\\s+"
      }
    }
  ]

If I put a non nested field, I works OK. With a nested field (regardless whether it is the subfield of attachment field or any other) it returns the following error message:

{"error":{"root_cause":[{"type":"exception","reason":"java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: field [keywords] not present as part of path [attachment.keywords]","header":{"processor_type":"split"}}],"type":"exception","reason":"java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: field [keywords] not present as part of path [attachment.keywords]","caused_by":{"type":"illegal_argument_exception","reason":"java.lang.IllegalArgumentException: field [keywords] not present as part of path [attachment.keywords]","caused_by":{"type":"illegal_argument_exception","reason":"field [keywords] not present as part of path [attachment.keywords]"}},"header":{"processor_type":"split"}},"status":500}

Please, help. I feel that it is quite natural to work further in a pipeline with the fields generated by the Ingest Attachment Processor.

Hey,

this looks as if the document extraction has not lead to any keyword extraction - which in turn will result in the split processor failing. You can set "ignore_missing" : true as part of the split processor configuration and everything should work, as the processor by defaults returns an error message if the field does not exist, which can be disabled using that option.

--Alex

Hi, Thaks. That solved my problem. I was sure that I am testing it on a document that had the field but apparently I was wrong.

If this document has keywords, and they are not extracted properly, please file a new issue in the elasticsearch github repo and also supply the document you tested with (plus all the configuration for your pipeline).

Thanks a lot!

--Alex

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.