I define a sub-field like so:
"tokenizer": {
"my_email_tokenizer": {
"type": "uax_url_email",
"max_token_length": 100,
}
},
....
"my_email_analyzer": {
"type": "custom",
"tokenizer": "my_email_tokenizer",
"filter": ["lowercase", "stop","length_filter"]
},
...
"fields": {
"emails":{
"type":"text",
"analyzer":"my_email_analyzer",
},
However when I try and analyze the email "foobar@baz.mail" against this field, the result is:
{'tokens': [{'end_offset': 13,
'position': 0,
'start_offset': 0,
'token': 'foobar@baz.ma',
'type': ''},
{'end_offset': 15,
'position': 1,
'start_offset': 13,
'token': 'il',
'type': ''}]}
Why is it splitting up the mail token? I thought it might be the max length, but I set it to 100 to be sure.
I am using ES 6.3.