Hell guys,
what ingest processor can I use to remove an item from an array?
In my case I want to remove an "error tag" that was added to the "tags" field.
Hell guys,
what ingest processor can I use to remove an item from an array?
In my case I want to remove an "error tag" that was added to the "tags" field.
The remove processor removes the whole field, not just an element in it?
In my example the whole "tags" field would be gone, not just the "error tag" which is one of multiple.
I am looking for the opposite processor to "Append".
PS: I just realized I've said "Hell" instead of "Hello" in my original post. - Sorry
Thanks for your reply @Balu
No worries! I figured that was a typo.
You are correct. The remove processor removes all the fields. I'm not aware of a processor that is the opposite of append. Have you considered using a script processor to go through the array and remove the error tags?
I have, but for me it's not as "painless" as I'd hope it to be . I need to learn the language more before I can do so.
May be some inspiration could come from Removing elements from an array in a document
The scripting documentation has an example too.
So I tried this:
if (ctx._source.tags.contains('_grok_dovecot_nomatch')) {
ctx._source.tags.remove(ctx._source.tags.indexOf('_grok_dovecot_nomatch'))
}
But I get a null pointer exception
: cannot access method/field [tags] from a null def reference
with a pointer to the _source
element.
The document I am using is from the index and has _source
. So I am not sure where that is coming from.
[
{
"_id": "o7U4PI4BdgQegvMfNBhs",
"_index": ".ds-logs-logs-default-2024.03.04-000004",
"_source": {
...
"message": "imap-postlogin: user=abc, homedir=/..., rip=10.0.0.1, lip=127.0.0.1, arguments=/...",
"tags": [
"journald-log",
"_grokparsefailure",
"_grok_dovecot_nomatch"
],
...
"@timestamp": "2024-03-14T09:08:15.503Z",
...
}
}
]
I had also tried the shorter version with the same result:
ctx._source.tags.remove('_grok_dovecot_nomatch')
Could you share a complete example of a _simulate
call? So we can iterate from this? Like this example.
Sure.
POST /_ingest/pipeline/_simulate
{
"pipeline" :
{
"description": "_description",
"processors": [
{
"grok": {
"field": "message",
"patterns": [
"%{IMAP_POSTLOGIN_WORD:dovecot.service}: user=%{DOVECOT_USER:dovecot.user}, homedir=%{DATA:dovecot.homedir}, rip=%{IP:dovecot.rip}, lip=%{IP:dovecot.lip}, arguments=%{DATA:dovecot.arguments},"
],
"pattern_definitions": {
"DOVECOT_USER": "%{USERNAME}|%{EMAILADDRESS}|%{DATA}",
"IMAP_POSTLOGIN_WORD": "imap-postlogin"
},
"ignore_missing": true,
"ignore_failure": true
}
},
{
"script": {
"source": "if (ctx._source.tags.contains('_grokparsefailure')) { \n ctx._source.tags.remove(ctx._source.tags.indexOf('_grokparsefailure')) \n}",
"if": "ctx?.dovecot?.service == 'imap-postlogin'"
}
}
]
},
"docs": [
{
"_index": "index",
"_id": "id",
"_source": {
"message": "imap-postlogin: user=us@r, homedir=/.../, rip=10.0.0.1, lip=127.0.0.1, arguments=/.../,",
"tags": [
"journald-log",
"_grokparsefailure"
]
}
}
]
}
Try:
POST /_ingest/pipeline/_simulate
{
"pipeline" :
{
"description": "_description",
"processors": [
{
"grok": {
"field": "message",
"patterns": [
"%{IMAP_POSTLOGIN_WORD:dovecot.service}: user=%{DOVECOT_USER:dovecot.user}, homedir=%{DATA:dovecot.homedir}, rip=%{IP:dovecot.rip}, lip=%{IP:dovecot.lip}, arguments=%{DATA:dovecot.arguments},"
],
"pattern_definitions": {
"DOVECOT_USER": "%{USERNAME}|%{EMAILADDRESS}|%{DATA}",
"IMAP_POSTLOGIN_WORD": "imap-postlogin"
},
"ignore_missing": true,
"ignore_failure": true
}
},
{
"script": {
"source": """
if (ctx.tags != null && ctx.tags.contains('_grokparsefailure')) {
ctx.tags.remove(ctx.tags.indexOf('_grokparsefailure'));
}""",
"if": "ctx?.dovecot?.service == 'imap-postlogin'"
}
}
]
},
"docs": [
{
"_index": "index",
"_id": "id",
"_source": {
"message": "imap-postlogin: user=us@r, homedir=/.../, rip=10.0.0.1, lip=127.0.0.1, arguments=/.../,",
"tags": [
"journald-log",
"_grokparsefailure"
]
}
}
]
}
This seems to work. Thank you.
I do understand the extra check for ctx.tags != null
, but I'm still confused when to use _source
and when not.
The context in processors already is the _source
document, but if I run a script somewhere else, it's not?
PS: An extra processor would make this easier though.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.
© 2020. All Rights Reserved - Elasticsearch
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant logo are trademarks of the Apache Software Foundation in the United States and/or other countries.