I've figured out what is happening here:
The tokenizer uax_url_email treats the email as a single token, so
what is implied in this forum post
i.e. that uax_url_email tokenizes the email is not correct. The
explanation here is more accurate:
If you want to tokenize the email on "@" and "." then a pattern
tokenizer can be used.
This is an example mapping that has both:
sortable is used to enable case-independent sorting
uax_url_email is used to analyse emails as a single token.
filter: [standard, lowercase, stop]
email_tokenizer is designed to tokenize emails on "." and "@"
Interestingly enough, its possible to get some quite nice partial
matches using the uax_url_email tokenizer by creating a String query
that allows leading wildcards, so for an email
"test.usergmail.com", the following would all match using and
leading and following wildcards (NB. use of leading wildcards is
resource intensive and can affect performance, if used across more
than one or two fields):
"usergmail", "test." "@gmail" "er@gm" etc.
On Jan 10, 9:46 am, davrob2 davirobe...@gmail.com wrote:
Bump on this, I'm pretty sure that email tokenizing has been sorted
since 16.0 (http://elasticsearch-users.115913.n3.nabble.com/Search-Email-Part-tp2...
) but for some reason I can't get it working.
On Jan 9, 4:33 pm, davrob2 davirobe...@gmail.com wrote:
I've defined an analyser and used it in a Mapping as defined below:
But when I enter something like @gmail.com, it is not matching the
wildcard search, it only matches when I enter "test.u...@gmail.com" or
"test.user" or "test." etc. i.e. there appears to be no tokenization
on the "@" or the ".".
This is the query I'm using: