Query_string search is not working correctly when field value contains dots

Hi.
An index contains a text property named description.
Value of the description is "2:1.9.4"

When I try to search using below query

GET finding-index/_search
{
  "query": {
  "bool" : {
    "must" : [
      {
        "query_string" : {
          "query" : "*2\\:1.9.4*",
          "fields" : [
            "description"
          ],
          "type" : "best_fields",
          "default_operator" : "and",
          "max_determinized_states" : 10000,
          "enable_position_increments" : true,
          "fuzziness" : "AUTO",
          "fuzzy_prefix_length" : 0,
          "fuzzy_max_expansions" : 50,
          "phrase_slop" : 0,
          "analyze_wildcard" : true,
          "escape" : true,
          "auto_generate_synonyms_phrase_query" : true,
          "fuzzy_transpositions" : true,
          "boost" : 1.0
        }
      }
    ],
    "adjust_pure_negative" : true,
    "boost" : 1.0
  }
}
}

the result is returned as expected. But if I use below query

GET finding-index/_search
{
  "query": {
  "bool" : {
    "must" : [
      {
        "query_string" : {
          "query" : "*2\\:1.9*",
          "fields" : [
            "description"
          ],
          "type" : "best_fields",
          "default_operator" : "and",
          "max_determinized_states" : 10000,
          "enable_position_increments" : true,
          "fuzziness" : "AUTO",
          "fuzzy_prefix_length" : 0,
          "fuzzy_max_expansions" : 50,
          "phrase_slop" : 0,
          "analyze_wildcard" : true,
          "escape" : true,
          "auto_generate_synonyms_phrase_query" : true,
          "fuzzy_transpositions" : true,
          "boost" : 1.0
        }
      }
    ],
    "adjust_pure_negative" : true,
    "boost" : 1.0
  }
}
}

the result is not returned.
Difference b/w above two queries is just in the value part. First query has value as "*2\\:1.9.4*" and second query has value as "*2\\:1.9*".

Index mapping:

{
  "finding-index": {
    "mappings": {
      "dynamic": "strict",
      "properties": {
        "_class": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "description": {
          "type": "text"
        }
      }
    }
  }
}

Can someone please guide on how to perform partial search on a value which contains dots?

This is most likely and analysis issue on the "description" field. Can you post the mapping for that field? The way the query is parsed depends on the analyzer used here, the default might not be good to match colons ":" and dots because they might be removed from the field value during indexing.

Here is what the index mapping looks like.

{
  "finding-index": {
    "mappings": {
      "dynamic": "strict",
      "properties": {
        "_class": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "description": {
          "type": "text"
        }
      }
    }
  }
}

I have removed all non-relevant properties.

I see, thats part of the problem. When you index a value like "2:1.9.4" into a "text" field without specific analyzer, it uses the "standard" english analyzer that breaks the value into tokens, usually separating on whitespaces and a few other characters. You can check what it does using the "_analyze" API endpoint. So your value gets stored as two units "2" and "1.9.4".
The query then uses the same analyzer, which ignores the "*". If you search for "2\:1.9" this actually translates to searching for two tokens "2" and "1.9" which doesn't match the document. Check this example:

DELETE test

PUT test
{
    "mappings": {
      "dynamic": "strict",
      "properties": {

        "description": {
          "type": "text"
        }
      }
    }
  
}




POST /test/_doc/1
{
  "description" : "2:1.9.4"
}

POST /test/_analyze
{
  "field": "description",
  "text" : "2:1.9.4"
}

POST /test/_analyze
{
  "field": "description",
  "text" : "*2\\:1.9.4*"
}

POST /test/_analyze
{
  "field": "description",
  "text" : "*2\\:1.9*"
}

Ideally the field you are searching on here would be mapped as a "keyword" or "wildcard" fields instead of a "text" field if you don't need full text search capabilities and scoring on it. Here’s a blog describing that describes part of the problem.

Going by the logic you stated the below query should not return any result, but it does...

GET test/_search
{
  "query": {
  "bool" : {
    "must" : [
      {
        "query_string" : {
          "query" : "*1.9*",
          "fields" : [
            "description^1.0"
          ],
          "type" : "best_fields",
          "default_operator" : "and",
          "max_determinized_states" : 10000,
          "enable_position_increments" : true,
          "fuzziness" : "AUTO",
          "fuzzy_prefix_length" : 0,
          "fuzzy_max_expansions" : 50,
          "phrase_slop" : 0,
          "analyze_wildcard" : true,
          "escape" : false,
          "auto_generate_synonyms_phrase_query" : true,
          "fuzzy_transpositions" : true,
          "boost" : 1.0
        }
      }
    ],
    "adjust_pure_negative" : true,
    "boost" : 1.0
  }
}
}

The asterisks in the query represent wild card search ("analyze_wildcard" : true). I am using * because I am not supplying the complete value. If you remove * from above query, then no result would be returned.

Also, we need full text search support on this field.

If you need fulltext search as well you need to preserve the dots and colons in your analysis somehow if you want to make sure to be able to search things like "2:1.2.3" and then do some sort of phrase query.

Can you explain why

GET test/_search
{
  "query": {
  "bool" : {
    "must" : [
      {
        "query_string" : {
          "query" : "*1.9*",
          "fields" : [
            "description^1.0"
          ],
          "type" : "best_fields",
          "default_operator" : "and",
          "max_determinized_states" : 10000,
          "enable_position_increments" : true,
          "fuzziness" : "AUTO",
          "fuzzy_prefix_length" : 0,
          "fuzzy_max_expansions" : 50,
          "phrase_slop" : 0,
          "analyze_wildcard" : true,
          "escape" : false,
          "auto_generate_synonyms_phrase_query" : true,
          "fuzzy_transpositions" : true,
          "boost" : 1.0
        }
      }
    ],
    "adjust_pure_negative" : true,
    "boost" : 1.0
  }
}
}

gives result but not

GET test/_search
{
  "query": {
  "bool" : {
    "must" : [
      {
        "query_string" : {
          "query" : "*2\\:1.9*",
          "fields" : [
            "description"
          ],
          "type" : "best_fields",
          "default_operator" : "and",
          "max_determinized_states" : 10000,
          "enable_position_increments" : true,
          "fuzziness" : "AUTO",
          "fuzzy_prefix_length" : 0,
          "fuzzy_max_expansions" : 50,
          "phrase_slop" : 0,
          "analyze_wildcard" : true,
          "escape" : true,
          "auto_generate_synonyms_phrase_query" : true,
          "fuzzy_transpositions" : true,
          "boost" : 1.0
        }
      }
    ],
    "adjust_pure_negative" : true,
    "boost" : 1.0
  }
}
}

You can run any query using the "_validate/query" endpoint and "explain=true", i.e.

GET test/_validate/query?explain=true
{ your query}

and look at the Lucene query this translates to.
In the first case we get:

"explanation": "+description:*1.9*"

in the second

"explanation": "+(+description:2 +description:1.9)"

Since your original value is analyzed and stored to "2" and "1.9.4" the first query returns the result. Notice the "escape" parameter is different, setting that to "false" in the second case will preserve the wildcard. Escaping is a complex topic with query string query, one more reason to not overuse that type of query for wildcard matching.

1 Like

Got it. Now it makes sense.

  1. Is there any way I can solve this problem via query?
  2. Is making the field keyword the only solution?
  3. Is there any problem if I make a field whose value can be in KB a keyword?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.