Exist Elasticsearch ingest pipeline if specific field exists

In an Elasticsearch Ingest Pipeline, how can I validate whether a specific field exists, and exit the pipeline immediately without processing if it does?

I suppose Fail processor could.

Each process has an if parameter to Conditionally execute the processor.
For example,only doc has field k1,then add filed res from k1'value.

POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "processors": [
      {
        "set": {
          "if": "ctx.containsKey(\"k1\")", 
          "field": "res",
          "copy_from": "k1"
        }
      }
    ]
  },
  "docs": [
    {
      "_source": {
        "k1":"1"
      }
    },
    {
      "_source": {
        "k2":"1"
      }
    }
  ]
}

You just write a top level pipeline and a pipeline that does the work.

Look at this

In your case the condition to call the work pipeline would look something like

"if": "ctx?.myfield != null",

If the field does not exist it will not execute.

Can you tell me what "?" after ctx means below?thanks

"if": "ctx?.myfield != null"

It's a "null Safety" check part of the syntax

Read this section

Incoming documents often contain object fields. If a processor script attempts to access a field whose parent object does not exist, Elasticsearch returns a NullPointerException. To avoid these exceptions, use null safe operators, such as ?., and write your scripts to be null safe.

For example, ctx.network?.name.equalsIgnoreCase('Guest') is not null safe. ctx.network?.name can return null. Rewrite the script as

1 Like

thanks a lot

I like this, but wish it could exit silently, without throwing an exception. It seems to me as if there should be an analogous exit processor.

This is what I'm currently doing. However, it's a fairly long processor, which only executes if a specific field is present.

How about Drop processor?
It is possible to use if conditional in the processor.

I hadn't thought of this - it may be what I do. Just out of curiosity, is there any significant overhead from calling a second ingest pipeline? I see that a few of the filebeat module pipelines do this (i.e., the elasticsearch pipelines).

I was unclear in my original post. I need to keep the document, no matter what, but only need to continue the pipeline if the specific field is not present. This would drop the document.

Now I understand what you want and found a trick.

Raise failure by internal fail processor and catch it by on_failure at the top of the pipeline definition. Specifying some null processor (set processor here) on_failure, the pipeline exits immediately at meeting the if conditional of the fail processor and index the document at the moment.

Please see "Handling pipeline failures" for the behavior.

POST /_ingest/pipeline/_simulate
{
  "docs":[
    {
      "_source":{
        "foo":"baa"
      }
    },{
      "_source":{
        "foo":"foo"
      }
    }
  ],
  "pipeline": {
    "processors": [
      {"fail":{
        "if":"ctx.foo=='baa'",
        "ignore_failure": false, 
        "message":"***"
      }},
      {
        "set":{
          "field": "following",
          "value": "processors"
        }
      }
    ],
    "on_failure":[{
      "set":{
        "if":"false",
        "field": "null",
        "value": "null"
      }
    }]
  }
}

Then you get:

{
  "docs" : [
    {
      "doc" : {
        "_index" : "_index",
        "_type" : "_doc",
        "_id" : "_id",
        "_source" : {
          "foo" : "baa"
        },
        "_ingest" : {
          "timestamp" : "2022-03-02T14:30:13.239791493Z"
        }
      }
    },
    {
      "doc" : {
        "_index" : "_index",
        "_type" : "_doc",
        "_id" : "_id",
        "_source" : {
          "foo" : "foo",
          "following" : "processors"
        },
        "_ingest" : {
          "timestamp" : "2022-03-02T14:30:13.239795055Z"
        }
      }
    }
  ]
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.