Ingest - grok - more fields by same processor, domain parsing

Hi,
let's consider this simple example document:

{
    "from": "user@mail.domain1.com",
    "to": "user2@mail.domain2.com"
}

I want to have aggregation based on domain's tld, l2 domain etc. In Elasticsearch 1.X I've wrote custom analyzers, that provide these tokens, but I cannot use them anymore because of keyword type (I cannot use the patter_capture filter in a custom normalizer).

So I'm trying ingest node with grok processor in pipeline:

POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "description": "parse multiple patterns",
    "processors": [
      {
        "grok": {
          "field": "from",
          "patterns": [
            "%{EMAILADDRESS}"
          ],
          "pattern_definitions": {
            "DOM": "[0-9A-Za-z][0-9A-Za-z-]{0,62}",
            "BASEDOM": "(?<l2domain>%{DOM}\\.(?<tld>%{DOM}))",
            "FULLDOM": "((?:%{DOM}\\.)*%{BASEDOM})",
            "URL": "%{PROTOCOL}://%{FULLDOM}",
            "EMAILADDRESS": "%{EMAILLOCALPART:username}@%{FULLDOM:domain}"
          }
        }
      }
    ]
  },
  "docs": [
    {
      "_source": {
        "from": "user@mail.domain1.com"
      }
    }
  ]
}

Works fine, but now I need the exactly same processor, but on the "to" folder. Should I just copy&paste the same processor for rcpt field? Is there any way in how to change the destination fields in pattern definition according to processed field (I want something like {{ _ingest.processed_field }}_tld ), or should I rename produced field names in way like: tld -> from_tld (in 1st processor) and to_tld (in 2nd processor). It will work, but it seems to be little bit clumsy.

Thanks for suggestions!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.