Hi everyone,
We're indexing a user database for an admin interface and would like to
search on email addresses. It works fine except for email addresses with
'.'s in them where our users expect to be able to search on either name
("some.user@domain" should match "some" or "user" or "domain"). Anyway, we
can't get any of the built-in tokenizers to split on the "." or on "_" for
that matter. The standard tokenizer works as expected on "-", but most
email addresses have dots.
Any tips? What am I doing wrong?
Ask
$ curl
'http://indexdev1.la.sol:9200/us-devel-rms-v1/_analyze?pretty=1&tokenizer=uax_url_email'
-d 'some.user@domain'
{
"tokens" : [ {
"token" : "some.user",
"start_offset" : 0,
"end_offset" : 9,
"type" : "",
"position" : 1
}, {
"token" : "domain",
"start_offset" : 10,
"end_offset" : 16,
"type" : "",
"position" : 2
} ]
}