Keyword subfield mapping causes unexpected querying results

Hryhorii · May 30, 2023, 9:06am

We have an index mapping schema with a lot of text fields. To be able to sort and filter them we added keyword subfield mapping with lowercase normalizer. Here is a short part of our schema:

{
  "mappings": {
    "dynamic": "strict",
    "properties": {
      "createdTime": {
        "type": "date",
        "format": "strict_date_optional_time||epoch_millis||basic_date"
      },
      "field1": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256,
            "normalizer": "lowercase"
          }
        }
      },
      "fileSize": {
        "type": "integer"
      },
    }
  }
}

We index 4 documents with such values in "field1":

selection
electron.jpg
election
ele cti on

Then we do a full-text search with this query:

POST <indexname>/_search
{
  "query": {
     "bool": {
       "must": [
         {
           "query_string": {
             "query": "ele*on"
           }
         }
       ]
    }
  }
}

But it returns incorrect results (expected are 2 and 3):

If we have a text field, and a subfield as a keyword - it returns 3 and 4 results
If we remap to have only a text field - it returns a 3 result
If we remap to have only a text field, and also add a simple build-in analyzer to it - it returns the expected results
If we have a text field, and a subfield as a keyword, and also add a simple build-in analyzer to the text field - it returns 2, 3 and 4

What we're missing here? What options do we have?

Please note, that we need to support sorting, filtering (which is available with a keyword subfield), and a full-text wildcard query with an asterisk in the middle.

dadoonet · May 30, 2023, 10:03am

Welcome!

Please note that it could be a bad practice to use wildcards (Query DSL | Elasticsearch Guide [8.11] | Elastic)...

And normally, users don't enter wildcards on a search engine. I'm never doing this within the google search bar as an example.

Instead, you should look at the wildcard field type if you really want to use wildcards.

But it returns incorrect results (expected are 2 and 3):

ele*on matches ele cti on IMO... But I understand what you mean. You want to compare full terms, right? So you want to compare ele*on with selection, electron.jpg, election, ele, cti and on, right.?

So you need to find an analyzer which does exactly this. I'd use a custom analyzer and use the _analyze API to better understand ho to build the right one for your use case. See Test an analyzer | Elasticsearch Guide [8.8] | Elastic.

I'd recommend looking at ngrams instead of using wildcards.

Hryhorii · May 30, 2023, 10:29am

Thanks for the quick response!

We'd read about that, but for now, we decided to start in this way since we migrating from Azure Search and we use a similar approach there (Azure Search is also built on top of the Lucene engine). For other scenarios (including trailing and leading wildcard querying), everything works fine.

Regarding using the wildcard field type as far as I understand we can't do a full-text search with this field, we have to add a specific field in wildcard query?

GET /_search
{
  "query": {
    "wildcard": {
      "user.id": {
        "value": "ki*y",
        "boost": 1.0,
        "rewrite": "constant_score"
      }
    }
  }
}

You got it right! As I mentioned, we tried a simple analyzer for a text field only. This works as expected (search response contains electron.jpg and election):

{
  "mappings": {
    "dynamic": "strict",
    "properties": {
      "createdTime": {
        "type": "date",
        "format": "strict_date_optional_time||epoch_millis||basic_date"
      },
      "field1": {
        "type": "text",
        "analyzer": "simple"
      },
      "fileSize": {
        "type": "integer"
      },
    }
  }
}

But as soon as we add a keyword subfield the search response will start returning electron.jpg, election, and also ele cti on. We found it weird since we thought that keyword subfield mapping should be different from the main text field.

Thanks for suggesting ngrams! Will we be able to support our scenarios with them (both full-text and search within a specific field)?

dadoonet · May 30, 2023, 1:03pm

I think (from what I recall), that Azure Search was actually built on top of Elasticsearch. But that's another story .

we can't do a full-text search with this field, we have to add a specific field in wildcard query?

Indeed. So normally I recommend doing multiple searches at the same time. Combining scores between partial match and exact match is normally super helpful for the end users. See the following script as an idea:

gist.github.com

https://gist.github.com/dadoonet/5179ee72ecbf08f12f53d4bda1b76bab

search_kibana_console.txt

### REINIT
DELETE user
PUT user
{
  "mappings": {
    "properties": {
      "name": {
        "type": "text"
      },
      "comments": {

This file has been truncated. show original

Will we be able to support our scenarios with them (both full-text and search within a specific field)?

Yes I believe so with the above strategy ^^^

Hryhorii · May 30, 2023, 5:03pm

As far as I understand, your recommendation is to do multiple wildcard searches for each field in our index document if we want to achieve a full-text search with a wildcard query. Like:

GET /_search
{
  "query": {
    "wildcard": {
      "field1": {
        "value": "ele*on"
      }
    },
    "wildcard": {
      "field2": {
        "value": "ele*on"
      }
    },
    "wildcard": {
      "field3": {
        "value": "ele*on"
      }
    }
  }
}

Do I get it right?

dadoonet · May 30, 2023, 5:31pm

Yeah. But was more thinking of something like:

GET /_search
{
  "query": {
    "multi_match" : {
      "query":    "ele on", 
      "fields": [ "field1.keyword^3.0", "field1^2.0", "field1.ngram", "field1.phonetic" ] 
    }
  }
}

But as you (really) want to use wildcards, I guess you have it right...

Hryhorii · May 30, 2023, 6:38pm

Thank you for your quick and detailed responses! I think it will help us.

system · June 27, 2023, 6:39pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Mappings Issue Elasticsearch	2	182	May 28, 2023
Query with Text field Elasticsearch	2	237	June 29, 2023
Bug with field mappings? Elasticsearch	4	496	August 17, 2018
Question about wildcard query Elasticsearch	9	473	May 5, 2021
Keyword with lowercase , text with raw case content Elasticsearch	0	15	November 16, 2024

Keyword subfield mapping causes unexpected querying results

Related topics