Hi,
I am trying to remove some fields from the document using the REMOVE processor in the pipeline.
I tried:
remove {
"field"=>"a",
"field"=>"b",
"field"=>"c",
"field"=>"d",
"field"=>"e",
"ignore_failure" : true
}
And:
remove {
"field"=>"a",
"ignore_failure" : true
},
remove {
"field"=>"b",
"ignore_failure" : true
},
remove {
"field"=>"c",
"ignore_failure" : true
},
remove {
"field"=>"d",
"ignore_failure" : true
},
remove {
"field"=>"e",
"ignore_failure" : true
}
But every time it just remove the last one in the processors chain.
Is there another syntax ? Or am I using it wrong ?
Thanks,
Ori
Hi @ori.rubinfeld ,
here is a complete example that worked for me:
DELETE /_ingest/pipeline/remove-fields
PUT /_ingest/pipeline/remove-fields
{
"description": "remove a couple of fields",
"processors": [
{
"remove": {
"field": "foo"
}
},
{
"remove": {
"field": "bar"
}
}
]
}
DELETE /sample-index
PUT /sample-index
POST /sample-index/type?pipeline=remove-fields
{
"foo": 1,
"bar": 2,
"baz": 3
}
POST /sample-index/type?pipeline=remove-fields
{
"foo": 4,
"bar": 5,
"baz": 6
}
GET /sample-index/type/_search
{
"query": {
"match_all": {}
}
}
The query returns:
{
"took": 37,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 1,
"hits": [
{
"_index": "sample-index",
"_type": "type",
"_id": "AVYsHLDoFKBGyZwFuEKG",
"_score": 1,
"_source": {
"baz": 3
}
},
{
"_index": "sample-index",
"_type": "type",
"_id": "AVYsHM4xFKBGyZwFuEKH",
"_score": 1,
"_source": {
"baz": 6
}
}
]
}
}
As you can see, both returned documents only contain the baz
field.
Daniel
1 Like
mvg
(Martijn Van Groningen)
July 27, 2016, 12:25pm
3
Daniel's example is how a pipeline should be specified. (an array element per processor)
Unfortunately specifying a pipeline like Ori doesn't fail... so I think that should be fixed.
This isn't ingest specific, but the json parser just allows this.
Can you please write an example ?
"an array element per processor" -
Does it means that it should not be as follows:
"processoers": [
{
{
"grok":.....
},
{
"remove":......
},
{
"remove":.....
}
}
]
How should it be written then ?
Ori
mvg
(Martijn Van Groningen)
July 28, 2016, 9:15am
5
It should just be like this:
"processors": [
{
"grok": {
// processor options
}
},
{
"remove": {
// processor options
}
},
{
"remove": {
// processor options
}
}
]
1 Like
This is the exact way how I wrote the processors, and yet, it only removes the last field mentioned in the pipeline.
If I will add another remove, it then will remove only the new one.
I am sending data from filebeat, can it be related to bulks ?
mvg
(Martijn Van Groningen)
July 28, 2016, 9:51am
7
This has nothing to de with bulk. At least what you shared is different:
remove {
"field"=>"a",
"ignore_failure" : true
},
remove {
"field"=>"b",
"ignore_failure" : true
},
remove {
"field"=>"c",
"ignore_failure" : true
},
remove {
"field"=>"d",
"ignore_failure" : true
},
remove {
"field"=>"e",
"ignore_failure" : true
}
Each processor needs to be wrapped in a extra json object in the processors
array.
So this:
{
"remove" {
"field"=>"e",
"ignore_failure" : true
}
}
instead of this:
"remove": {
"field"=>"e",
"ignore_failure" : true
}
This behavior you're seeing with this format is a bug. ES should fail when processors are accepted in this way; instead of accepting it and only using the last processor.
This is correct!!!!
Thanks!!! works great now.
Had a similar problem with gsub, will check it too.