Hi,
let's consider this simple example document:
{
"from": "user@mail.domain1.com",
"to": "user2@mail.domain2.com"
}
I want to have aggregation based on domain's tld, l2 domain etc. In Elasticsearch 1.X I've wrote custom analyzers, that provide these tokens, but I cannot use them anymore because of keyword type (I cannot use the patter_capture filter in a custom normalizer).
So I'm trying ingest node with grok processor in pipeline:
POST _ingest/pipeline/_simulate
{
"pipeline": {
"description": "parse multiple patterns",
"processors": [
{
"grok": {
"field": "from",
"patterns": [
"%{EMAILADDRESS}"
],
"pattern_definitions": {
"DOM": "[0-9A-Za-z][0-9A-Za-z-]{0,62}",
"BASEDOM": "(?<l2domain>%{DOM}\\.(?<tld>%{DOM}))",
"FULLDOM": "((?:%{DOM}\\.)*%{BASEDOM})",
"URL": "%{PROTOCOL}://%{FULLDOM}",
"EMAILADDRESS": "%{EMAILLOCALPART:username}@%{FULLDOM:domain}"
}
}
}
]
},
"docs": [
{
"_source": {
"from": "user@mail.domain1.com"
}
}
]
}
Works fine, but now I need the exactly same processor, but on the "to" folder. Should I just copy&paste the same processor for rcpt field? Is there any way in how to change the destination fields in pattern definition according to processed field (I want something like {{ _ingest.processed_field }}_tld ), or should I rename produced field names in way like: tld -> from_tld (in 1st processor) and to_tld (in 2nd processor). It will work, but it seems to be little bit clumsy.
Thanks for suggestions!