Using the Remove processor for ingest node

Hi,

I am trying to remove some fields from the document using the REMOVE processor in the pipeline.

I tried:

remove {
"field"=>"a",
"field"=>"b",
"field"=>"c",
"field"=>"d",
"field"=>"e",
"ignore_failure" : true
}

And:

remove {
"field"=>"a",
"ignore_failure" : true
},
remove {
"field"=>"b",
"ignore_failure" : true
},
remove {
"field"=>"c",
"ignore_failure" : true
},
remove {
"field"=>"d",
"ignore_failure" : true
},
remove {
"field"=>"e",
"ignore_failure" : true
}

But every time it just remove the last one in the processors chain.
Is there another syntax ? Or am I using it wrong ?

Thanks,

Ori

Hi @ori.rubinfeld,

here is a complete example that worked for me:

DELETE /_ingest/pipeline/remove-fields

PUT /_ingest/pipeline/remove-fields
{
   "description": "remove a couple of fields",
   "processors": [
      {
         "remove": {
            "field": "foo"
         }
      },
      {
         "remove": {
            "field": "bar"
         }
      }
   ]
}

DELETE /sample-index

PUT /sample-index

POST /sample-index/type?pipeline=remove-fields
{
    "foo": 1,
    "bar": 2,
    "baz": 3
}

POST /sample-index/type?pipeline=remove-fields
{
    "foo": 4,
    "bar": 5,
    "baz": 6
}

GET /sample-index/type/_search
{
    "query": {
        "match_all": {}
    }
}

The query returns:

{
   "took": 37,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 2,
      "max_score": 1,
      "hits": [
         {
            "_index": "sample-index",
            "_type": "type",
            "_id": "AVYsHLDoFKBGyZwFuEKG",
            "_score": 1,
            "_source": {
               "baz": 3
            }
         },
         {
            "_index": "sample-index",
            "_type": "type",
            "_id": "AVYsHM4xFKBGyZwFuEKH",
            "_score": 1,
            "_source": {
               "baz": 6
            }
         }
      ]
   }
}

As you can see, both returned documents only contain the baz field.

Daniel

1 Like

Daniel's example is how a pipeline should be specified. (an array element per processor)

Unfortunately specifying a pipeline like Ori doesn't fail... so I think that should be fixed.
This isn't ingest specific, but the json parser just allows this.

Can you please write an example ?

"an array element per processor" -
Does it means that it should not be as follows:

"processoers": [
{
{
"grok":.....
},
{
"remove":......
},
{
"remove":.....
}
}
]

How should it be written then ?

Ori

It should just be like this:

"processors": [
	{
		"grok": {
			// processor options
		}
	},
	{
		"remove": {
			// processor options
		}
	},
	{
		"remove": {
			// processor options
		}
	}
]
1 Like

This is the exact way how I wrote the processors, and yet, it only removes the last field mentioned in the pipeline.
If I will add another remove, it then will remove only the new one.

I am sending data from filebeat, can it be related to bulks ?

This has nothing to de with bulk. At least what you shared is different:

remove {
"field"=>"a",
"ignore_failure" : true
},
remove {
"field"=>"b",
"ignore_failure" : true
},
remove {
"field"=>"c",
"ignore_failure" : true
},
remove {
"field"=>"d",
"ignore_failure" : true
},
remove {
"field"=>"e",
"ignore_failure" : true
}

Each processor needs to be wrapped in a extra json object in the processors array.

So this:

{
 "remove" {
   "field"=>"e",
   "ignore_failure" : true
 }
}

instead of this:

"remove":  {
  "field"=>"e",
  "ignore_failure" : true
}

This behavior you're seeing with this format is a bug. ES should fail when processors are accepted in this way; instead of accepting it and only using the last processor.

This is correct!!!!
Thanks!!! works great now.
Had a similar problem with gsub, will check it too.