This is a bit of a long answer, because you're hitting on some fundamental Elasticsearch concepts.
Elasticsearch has two types of string field: text
and keyword
. The main difference between the two is that text
fields get analyzed, and as a result you can use those fields for full-text query features like case-insensitive search. keyword
fields on the other hand are not analyzed. As a result, keyword
fields are typically used for exact, case-sensitive searches.
As often with Elasticsearch, there are multiple ways to solve your requirement. But your options depend on whether the productDescription
field is a text
or keyword
field in your index' mapping.
If productDescription
is a text
field (default), you could use a custom analyzer to create a single lower-cased token, and use a prefix query on that:
PUT myindex
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "keyword",
"filter": ["lowercase"]
}
}
}
},
"mappings": {
"properties": {
"productDescription": {
"type": "text",
"analyzer": "my_analyzer"
}
}
}
}
POST myindex/_doc
{
"productDescription": "12 tons of happyness"
}
POST myindex/_doc
{
"productDescription": "12 TONS of smiles"
}
POST myindex/_doc
{
"productDescription": "12 Tons of kiss"
}
GET myindex/_search
{
"query": {
"prefix": {
"productDescription": {
"value": "12 tons"
}
}
}
}
However, something to be aware of is that the prefix query is a term-level query. Term-level queries do not analyze search terms. So, this solution only works if you know for certain that the query will always be in lower case. The following request fails to find the documents for example:
GET myindex/_search
{
"query": {
"prefix": {
"productDescription": {
"value": "12 Tons"
}
}
}
}
For that reason, I'd say that mapping the productDescription
as a keyword
field and applying a normalizer would be the better option. Now, your query will be truly case-insensitive:
PUT myindex
{
"settings": {
"analysis": {
"normalizer": {
"my_normalizer": {
"filter": ["lowercase"]
}
}
}
},
"mappings": {
"properties": {
"productDescription": {
"type": "keyword",
"normalizer": "my_normalizer"
}
}
}
}
POST myindex/_doc
{
"productDescription": "12 tons of happyness"
}
POST myindex/_doc
{
"productDescription": "12 TONS of smiles"
}
POST myindex/_doc
{
"productDescription": "12 Tons of kiss"
}
GET myindex/_search
{
"query": {
"prefix": {
"productDescription": {
"value": "12 tons"
}
}
}
}
GET myindex/_search
{
"query": {
"prefix": {
"productDescription": {
"value": "12 Tons"
}
}
}
}
By the way, all of this will become much easier in the next version of Elasticsearch, 7.10. The prefix query will get a case_insensitive
parameter. You will then be able to query keyword
fields case-insensitively with a prefix
query, without the need for a normalizer.