I am wondering if there are plans to implement a more advanced version of the current URL Email Tokenizer, such as the UAX29 one provided by solr (https://lucene.apache.org/solr/guide/7_3/tokenizers.html#uax29-url-email-tokenizer)
The most useful features for us would be:
- Words are split at hyphens, unless there is a number in the word, in which case the token is not split and the numbers and hyphen(s) are preserved.
- Support for proper tokenization of IP addresses (in particular, IPv6 addresses are not tokenized properly with the ES version of the url email tokenizer).
If not, how could I go about implementing a custom tokenizer?