Opposite of "Append" processor?

Hell guys,

what ingest processor can I use to remove an item from an array?

In my case I want to remove an "error tag" that was added to the "tags" field.

Hi @Balu,

Would the remove processor be what you are looking for?

Hope this helps!

Jess

The remove processor removes the whole field, not just an element in it?

In my example the whole "tags" field would be gone, not just the "error tag" which is one of multiple.

I am looking for the opposite processor to "Append".

PS: I just realized I've said "Hell" instead of "Hello" in my original post. :astonished: - Sorry

1 Like

Thanks for your reply @Balu

No worries! I figured that was a typo.

You are correct. The remove processor removes all the fields. I'm not aware of a processor that is the opposite of append. Have you considered using a script processor to go through the array and remove the error tags?

1 Like

I have, but for me it's not as "painless" as I'd hope it to be :wink:. I need to learn the language more before I can do so.

1 Like

May be some inspiration could come from Removing elements from an array in a document

1 Like

The scripting documentation has an example too.

So I tried this:

if (ctx._source.tags.contains('_grok_dovecot_nomatch')) { 
  ctx._source.tags.remove(ctx._source.tags.indexOf('_grok_dovecot_nomatch')) 
}

But I get a null pointer exception: cannot access method/field [tags] from a null def reference with a pointer to the _source element.

The document I am using is from the index and has _source. So I am not sure where that is coming from.

[
  {
    "_id": "o7U4PI4BdgQegvMfNBhs",
    "_index": ".ds-logs-logs-default-2024.03.04-000004",
    "_source": {
...
      "message": "imap-postlogin: user=abc, homedir=/..., rip=10.0.0.1, lip=127.0.0.1, arguments=/...",
      "tags": [
        "journald-log",
        "_grokparsefailure",
        "_grok_dovecot_nomatch"
      ],
...
      "@timestamp": "2024-03-14T09:08:15.503Z",
...
    }
  }
]

I had also tried the shorter version with the same result:

ctx._source.tags.remove('_grok_dovecot_nomatch')

Could you share a complete example of a _simulate call? So we can iterate from this? Like this example.

Sure.

POST /_ingest/pipeline/_simulate
{
  "pipeline" :
  {
    "description": "_description",
    "processors": [
      {
        "grok": {
          "field": "message",
          "patterns": [
            "%{IMAP_POSTLOGIN_WORD:dovecot.service}: user=%{DOVECOT_USER:dovecot.user}, homedir=%{DATA:dovecot.homedir}, rip=%{IP:dovecot.rip}, lip=%{IP:dovecot.lip}, arguments=%{DATA:dovecot.arguments},"
          ],
          "pattern_definitions": {
            "DOVECOT_USER": "%{USERNAME}|%{EMAILADDRESS}|%{DATA}",
            "IMAP_POSTLOGIN_WORD": "imap-postlogin"
          },
          "ignore_missing": true,
          "ignore_failure": true
        }
      },
      {
        "script": {
          "source": "if (ctx._source.tags.contains('_grokparsefailure')) { \n  ctx._source.tags.remove(ctx._source.tags.indexOf('_grokparsefailure')) \n}",
          "if": "ctx?.dovecot?.service == 'imap-postlogin'"
        }
      }      
    ]
  },
  "docs": [
    {
      "_index": "index",
      "_id": "id",
      "_source": {
        "message": "imap-postlogin: user=us@r, homedir=/.../, rip=10.0.0.1, lip=127.0.0.1, arguments=/.../,",
        "tags": [
          "journald-log",
          "_grokparsefailure"
        ]
      }
    }
  ]
}

Try:

POST /_ingest/pipeline/_simulate
{
  "pipeline" :
  {
    "description": "_description",
    "processors": [
      {
        "grok": {
          "field": "message",
          "patterns": [
            "%{IMAP_POSTLOGIN_WORD:dovecot.service}: user=%{DOVECOT_USER:dovecot.user}, homedir=%{DATA:dovecot.homedir}, rip=%{IP:dovecot.rip}, lip=%{IP:dovecot.lip}, arguments=%{DATA:dovecot.arguments},"
          ],
          "pattern_definitions": {
            "DOVECOT_USER": "%{USERNAME}|%{EMAILADDRESS}|%{DATA}",
            "IMAP_POSTLOGIN_WORD": "imap-postlogin"
          },
          "ignore_missing": true,
          "ignore_failure": true
        }
      },
      {
        "script": {
          "source": """
if (ctx.tags != null && ctx.tags.contains('_grokparsefailure')) { 
    ctx.tags.remove(ctx.tags.indexOf('_grokparsefailure'));
}""",
          "if": "ctx?.dovecot?.service == 'imap-postlogin'"
        }
      }      
    ]
  },
  "docs": [
    {
      "_index": "index",
      "_id": "id",
      "_source": {
        "message": "imap-postlogin: user=us@r, homedir=/.../, rip=10.0.0.1, lip=127.0.0.1, arguments=/.../,",
        "tags": [
          "journald-log",
          "_grokparsefailure"
        ]
      }
    }
  ]
}
1 Like

This seems to work. Thank you.

I do understand the extra check for ctx.tags != null, but I'm still confused when to use _source and when not.

The context in processors already is the _source document, but if I run a script somewhere else, it's not?

PS: An extra processor would make this easier though. :wink:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.