Inconsistent number of tokens with uax_url_email

I would like the first tokenizing below (#1) to result in 1 token. The only difference to #2 is that the domain starts with a number. Any feedback is greatly appreciated. Thank you.

#1. Results in 2 tokens:
curl -XGET 'http://localhost:9200/_analyze?tokenizer=uax_url_email&'


#2. Results in 1 token:
curl -XGET 'http://localhost:9200/_analyze?tokenizer=uax_url_email&'


The way I understand this you need to provide a real URL which contains a scheme like http:// in order to trigger make use of the tokenizer. You can easily reproduce this by not specifying the tokenizer and getting the exact same results. Those differ when using a real URL.


Hi Alex,

Thank you for your answer. That helps us understand the tokenizer and find a solution since out data/urls do not have schemes.