When my document are indexed, most of them actually come in with the email value beign uppercased. I didn't think this should really be an issue because when I search I can lowercase the term(s).
The problem is that if I do a search, it will only match if the term is cased as it was in the document - i.e. if the document had an uppercased email, it will only match if I provide the uppercased email in the search:
The above would return a value because it's uppercased, but if I issue the same search as 'matches@gmail.com', nothing is returned.
Am I doing something wrong? The lowercase filter from the analyzer means the indexed terms should all be lowercased but tokenized as emails (per the uax tokenizer), but the searches seem to require the same case as the original document.
I have some good experience with Lucene but this is my first foray into ElasticSearch. I'm using 1.6.0.
This is a brand new index. I have deleted/recreated the index a few times, verifying the mappings & settings, and repopulating with my test records.
The sequence I'm running is right here:
DELETE my_index
PUT my_index
POST my_index/_close
PUT my_index/_settings
{
"analysis" : {
"analyzer" : {
"email_analyzer" : {
"filters" : [
"lowercase"
],
"type": "custom",
"tokenizer" : "uax_url_email"
}
}
}
}
POST my_index/_open
PUT my_index/main/_mapping
{
"properties": {
"email": {
"type": "string",
"analyzer": "email_analyzer"
}
}
}
It's pretty confusing
edit: since my post here I did adjust to remove the standard and stop filters, which didn't do anything to help me, but that's why it's a bit different from my earlier post
I have a Java application as populating the index with some documents including email addresses. The emails that come in are uppercase.
The analyzer is supposed to standardize the indexed terms correct? My expectation was that with the lowercase filter in my email_analyzer that the searchable terms would be lowercase.
So no matter what I'd expect to search using lowercase values, possibly with an analyzer specified in the querybuilder object.
But when I do a search it is only returning if I search with uppercase. That is to say it seems like the search on the email address is case sensitive.
It did work the way I expected after that change. Elastic has been real good about letting me know about malformed requests I didn't think to look for a typo.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.