I have a case where I have to extract domain part from emails that are
found in a text. I used uax_url_email tokenizer to create emails as a
single. And I have a pattern_capture filter which will emit "@(.+)" pattern
string. But uax_url_email also return words also which is not an email and
the pattern capture filter does not filter that. Any suggestions?
"custom_analyzer":{
"tokenizer": "uax_url_email",
"filter": [
"email_domain_filter"
]
}
"filter": {
"email_domain_filter":{
"type": "pattern_capture",
"preserve_original": false,
"patterns": [
"@(.+)"
]
}
}
input string : "my email id is xyz@gmail.com"
Output tokens: my, email, id, is, gmail.com
But I need only gmail.com
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3de51758-bb99-46c6-b47c-a68004de8eb8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.