Email Analyzer failing in 0.16.0

After updating to 0.16.0 from 0.15.2 we began getting some results for
queries when we weren't expecting results.

For example, while using 0.15.2, the following query returns no hits:

{
"query": {
"field" : {
"accountEmail.address": "me@example.org"
}
}
}

We are using our "email_regex" analyzer for the accountEmail.address field.
It is defined here:

index.analysis.analyzer.email_regex.type=pattern
index.analysis.analyzer.email_regex.lowercase=true
index.analysis.analyzer.email_regex.pattern=[@.]
index.analysis.analyzer.email_regex.stopwords=none

So, now with 0.16.0, the above query now returns a match on this record:

[ {
"_id" : "ppkc_admin",
"dataType" : "profiles",
"source" : "PPKC",
"username" : "admin",
"password" : "ae2b1fca515949e5d54fb22b8ed95575",
"languagePref" : "en",
"accountType" : "paid",
"profileType" : "admin",
"country" : "US",
"accountEmail" : {
"status" : "verified",
"address" : "someone@example.org"
},
"messagingType" : "open"
} ]

FWIW, if we change our analyzer to the following definition, it works:

index.analysis.analyzer.email_regex.type=custom
index.analysis.analyzer.email_regex.tokenizer=uax_url_email
index.analysis.analyzer.email_regex.lowercase=true

Perhaps our initial analyzer definition is wrong, or a new bug was
introduced. We are not sure.

Any help is appreciated.

The way email are tokenized by the standard analyzer (the default analyzer) was changed in Lucene 3.1. Can you post your mappings? Maybe the analyzer is not applied? For a faster turnaround, gist a curl sample of what you expect :slight_smile:
On Tuesday, May 3, 2011 at 10:13 PM, James Cook wrote:

After updating to 0.16.0 from 0.15.2 we began getting some results for queries when we weren't expecting results.

For example, while using 0.15.2, the following query returns no hits:

{
"query": {
"field" : {
"accountEmail.address": "me@example.org"
}
}
}

We are using our "email_regex" analyzer for the accountEmail.address field. It is defined here:

index.analysis.analyzer.email_regex.type=pattern
index.analysis.analyzer.email_regex.lowercase=true
index.analysis.analyzer.email_regex.pattern=[@.]
index.analysis.analyzer.email_regex.stopwords=none

So, now with 0.16.0, the above query now returns a match on this record:

[ {
"_id" : "ppkc_admin",
"dataType" : "profiles",
"source" : "PPKC",
"username" : "admin",
"password" : "ae2b1fca515949e5d54fb22b8ed95575",
"languagePref" : "en",
"accountType" : "paid",
"profileType" : "admin",
"country" : "US",
"accountEmail" : {
"status" : "verified",
"address" : "someone@example.org"
},
"messagingType" : "open"
} ]

FWIW, if we change our analyzer to the following definition, it works:

index.analysis.analyzer.email_regex.type=custom
index.analysis.analyzer.email_regex.tokenizer=uax_url_email
index.analysis.analyzer.email_regex.lowercase=true

Perhaps our initial analyzer definition is wrong, or a new bug was introduced. We are not sure.

Any help is appreciated.