Email Analyzer failing in 0.16.0


(James Cook) #1

After updating to 0.16.0 from 0.15.2 we began getting some results for
queries when we weren't expecting results.

For example, while using 0.15.2, the following query returns no hits:

{
"query": {
"field" : {
"accountEmail.address": "me@example.org"
}
}
}

We are using our "email_regex" analyzer for the accountEmail.address field.
It is defined here:

index.analysis.analyzer.email_regex.type=pattern
index.analysis.analyzer.email_regex.lowercase=true
index.analysis.analyzer.email_regex.pattern=[@.]
index.analysis.analyzer.email_regex.stopwords=none

So, now with 0.16.0, the above query now returns a match on this record:

[ {
"_id" : "ppkc_admin",
"dataType" : "profiles",
"source" : "PPKC",
"username" : "admin",
"password" : "ae2b1fca515949e5d54fb22b8ed95575",
"languagePref" : "en",
"accountType" : "paid",
"profileType" : "admin",
"country" : "US",
"accountEmail" : {
"status" : "verified",
"address" : "someone@example.org"
},
"messagingType" : "open"
} ]

FWIW, if we change our analyzer to the following definition, it works:

index.analysis.analyzer.email_regex.type=custom
index.analysis.analyzer.email_regex.tokenizer=uax_url_email
index.analysis.analyzer.email_regex.lowercase=true

Perhaps our initial analyzer definition is wrong, or a new bug was
introduced. We are not sure.

Any help is appreciated.


(Shay Banon) #2

The way email are tokenized by the standard analyzer (the default analyzer) was changed in Lucene 3.1. Can you post your mappings? Maybe the analyzer is not applied? For a faster turnaround, gist a curl sample of what you expect :slight_smile:
On Tuesday, May 3, 2011 at 10:13 PM, James Cook wrote:

After updating to 0.16.0 from 0.15.2 we began getting some results for queries when we weren't expecting results.

For example, while using 0.15.2, the following query returns no hits:

{
"query": {
"field" : {
"accountEmail.address": "me@example.org"
}
}
}

We are using our "email_regex" analyzer for the accountEmail.address field. It is defined here:

index.analysis.analyzer.email_regex.type=pattern
index.analysis.analyzer.email_regex.lowercase=true
index.analysis.analyzer.email_regex.pattern=[@.]
index.analysis.analyzer.email_regex.stopwords=none

So, now with 0.16.0, the above query now returns a match on this record:

[ {
"_id" : "ppkc_admin",
"dataType" : "profiles",
"source" : "PPKC",
"username" : "admin",
"password" : "ae2b1fca515949e5d54fb22b8ed95575",
"languagePref" : "en",
"accountType" : "paid",
"profileType" : "admin",
"country" : "US",
"accountEmail" : {
"status" : "verified",
"address" : "someone@example.org"
},
"messagingType" : "open"
} ]

FWIW, if we change our analyzer to the following definition, it works:

index.analysis.analyzer.email_regex.type=custom
index.analysis.analyzer.email_regex.tokenizer=uax_url_email
index.analysis.analyzer.email_regex.lowercase=true

Perhaps our initial analyzer definition is wrong, or a new bug was introduced. We are not sure.

Any help is appreciated.


(system) #3