Wildcard search case insensitive


(Mysurf Mail) #1

Going over the wildcard search documentation I am missing a few issues.
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-wildcard-query.html

First, I have the following mapping and entities:

{
    "ms_search_v1": {
        "mappings": {
            "entries": {
                "properties": {
                    "index": {
                        "type": "long"
                    },
                    "metaSite": {
                        "properties": {
                            "name1": {
                                "type": "keyword"
                            },
                            "name2": {
                                "type": "text"
                            }
                        }
                    }
                }
            }
        }
    }
}

entities

[
  {
    "metaSite": {
      "name1": "Aaa Bbb",
      "name2": "Aaa Bbb"
    },
    "index": 0
  },
  {
    "metaSite": {
      "name1": "Bbb Ccc",
      "name2": "Bbb Ccc"
    },
    "index": 1
  },
  {
    "metaSite": {
      "name1": "Aaa Ccc",
      "name2": "Aaa Ccc"
    },
    "index": 2
  }
]
  1. On which data type does wild card search works? Keyword or text?
    according to the doc

Keyword fields are only searchable by their exact value.
But it does seem to work when I search it with case sensitive wildcard

 {
     "query": {
         "wildcard" : { "metaSite.name1" : "Aaa*" }
     }
 }
  1. When I query the text I get nothing.
    {
    "query": {
    "wildcard" : { "metaSite.name2" : "Aaa*" }
    }
    }

How come?

  1. How do I query wildcard with case insensitive?
    and on what data type should I store the data?

I tried adding the following

...
"name3": {
    "type": "keyword",
    "normalizer": "lowercase_normalizer"
}
....

where the normalizer is defined with lowercase analyzer and it worked.
But is it the right way?
Thanks.


(David Pilato) #2

So in short. When you index a field with a text type and default analyzer, it is indexed in lowercase and broken into tokens.

Aaa Bbb became aaa, bbb.

When you search with a wildcard query (which is a bad idea), you must use the same case as was indexed. So aa* should work where Aa* shouldn't.

For keywords, they are indexed as Aaa Bbb, so searching for Aa* will match. Note that Bb* won't match.

With a lowercase normalizer, aa* will match. Aa*, Bb* or bb* won't match.


(Mysurf Mail) #3

Thanks for your reply.
Ran a test on name 3 with the lowercase_normalizer.

It did return values for Aa*

"query": {
    "wildcard" : { "metaSite.name3" : "Aa*" }
}

So What my actual question is

  1. is this the best practice? (keyword + lowercase_normalizer)
  2. My field now is keyword. But according to ES doc I think it would better be text (since it is a name and is a free text).

(David Pilato) #4

Hmmm. That's interesting. I was not expecting that.
I can indeed reproduce it. I'll check internally on that and will update.

Anyway, yes using keyword for fulltext search is not a good idea IMO.
And again using wildcard query is generally a bad idea.


(Mysurf Mail) #5

Thanks.
This is an old system.

  1. What is your recommendation?
  2. Should I turn it to text data type (it is a name field)
    If the user requirement is to search text with prefix and I want to maintain case insensitivity search.
  3. if the main usage if prefix then I could use prefix instead of wildcard. Will it improve the query (since wildcard uses the string as a prefix to * as in Aa*).

in general, best practice or guid lines to wildcard case insensitive search in ES is hard to find.


(David Pilato) #6

Using edge ngram based analyzer is preferable.
You pay the price at index time but not at search time.

See https://www.elastic.co/guide/en/elasticsearch/reference/6.2/analysis-edgengram-tokenizer.html


(Mysurf Mail) #7

In this example I would need to write down the max gram parameter.
I really don't know what it would be since it is a name filed filled by the user.

I thought a better option would be using multi field.
https://www.elastic.co/guide/en/elasticsearch/reference/current/multi-fields.html
What do you think?


(David Pilato) #8

Multifield is generally a good solution when you want to have multiple ways of searching the same thing or have multiple use cases.


(system) #9

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.