Elasticsearch ingest processor to test for field type


(Harika Tandra) #1

Hello all,

I am looking for a way to check if a field is "array" or "object" datatype before a document is ingested. For example,

PUT my_id2/_create {
"locations" : {
"first" : { .... }
}
}

PUT my_id1/_create {
"locations" : [ "abc","xyz"]
}

I want to use this to remove in-consistencies in the input data. Rename "locations" field if it is an array so there is no mapping error when the documents are inserted.

Can I do this using "scripting" or "Grok" processor ?
Any help or suggestions will be very helpful.

Thanks,
H.


(Abdon Pijpelink) #2

You could use a script processor for that, and make use of the fact that an object is a HashMap and an array is a LinkedList. For example, the pipeline in the _simulate request below will first rename the location field to location.original and next create a new field location.array if the field is an array (instance of List), or a new field location.object if the field is an object (instanceof Map).

POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "description": "Test",
    "processors": [
      {
        "rename": {
          "field": "location",
          "target_field": "location.original",
          "ignore_missing": true
        }
      },
      {
        "script": {
          "lang": "painless",
          "inline": "if (ctx.location.original instanceof List) { ctx.location.array = ctx.location.original } else if (ctx.location.original instanceof Map) { ctx.location.object = ctx.location.original } "
        }
      }
    ]
  },
  "docs": [
    {
      "_index": "my_index",
      "_type": "doc",
      "_id": "1",
      "_source": {
        "location": [
          "abc",
          "xyz"
        ]
      }
    },
    {
      "_index": "my_index",
      "_type": "doc",
      "_id": "2",
      "_source": {
        "location": {
          "first": {
            "foo": "bar"
          }
        }
      }
    },
    {
      "_index": "my_index",
      "_type": "doc",
      "_id": "3",
      "_source": {
        "location": "test"
      }
    }
    ,
    {
      "_index": "my_index",
      "_type": "doc",
      "_id": "4",
      "_source": {
        "location": []
      }
    }
  ]
}

(Harika Tandra) #3

Thank you so much for your detailed replied. It is most helpful !

Thanks a bunch again.
-H


(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.