Ingest - grok - more fields by same processor, domain parsing

jarda-manana · June 14, 2017, 11:37am

Hi,
let's consider this simple example document:

{
    "from": "user@mail.domain1.com",
    "to": "user2@mail.domain2.com"
}

I want to have aggregation based on domain's tld, l2 domain etc. In Elasticsearch 1.X I've wrote custom analyzers, that provide these tokens, but I cannot use them anymore because of keyword type (I cannot use the patter_capture filter in a custom normalizer).

So I'm trying ingest node with grok processor in pipeline:

POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "description": "parse multiple patterns",
    "processors": [
      {
        "grok": {
          "field": "from",
          "patterns": [
            "%{EMAILADDRESS}"
          ],
          "pattern_definitions": {
            "DOM": "[0-9A-Za-z][0-9A-Za-z-]{0,62}",
            "BASEDOM": "(?<l2domain>%{DOM}\\.(?<tld>%{DOM}))",
            "FULLDOM": "((?:%{DOM}\\.)*%{BASEDOM})",
            "URL": "%{PROTOCOL}://%{FULLDOM}",
            "EMAILADDRESS": "%{EMAILLOCALPART:username}@%{FULLDOM:domain}"
          }
        }
      }
    ]
  },
  "docs": [
    {
      "_source": {
        "from": "user@mail.domain1.com"
      }
    }
  ]
}

Works fine, but now I need the exactly same processor, but on the "to" folder. Should I just copy&paste the same processor for rcpt field? Is there any way in how to change the destination fields in pattern definition according to processed field (I want something like {{ _ingest.processed_field }}_tld ), or should I rename produced field names in way like: tld -> from_tld (in 1st processor) and to_tld (in 2nd processor). It will work, but it seems to be little bit clumsy.

Thanks for suggestions!

system · July 12, 2017, 11:37am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elastic Ingest with multiple grok processors Elasticsearch	7	4665	January 31, 2017
How can i add field in ingest pipeline: Elasticsearch	3	4616	June 25, 2019
Ingest pipline - multiple fields processed by one porcessor Elasticsearch	12	3951	March 31, 2019
Unexpected result for grok processor with multiple patterns Elasticsearch	1	599	December 23, 2016
Ingest pipeline multiplies processed documents (resulting in three copies) Elasticsearch	5	488	November 19, 2020

Ingest - grok - more fields by same processor, domain parsing

Related topics