"query_string" Wildcard search with special characters issue

Anas1 · November 2, 2020, 6:20pm

When searching using a wildcard words, i have an unexpected behavior.
I'm working on ES 5.6.8.

To reproduce the issue:

(Test with Kibana)

- create the index :

    PUT my-index-00001
{
  "mappings": {
      "test": {
        "properties": {
          "name1": {
            "type": "keyword",
            "fields": {
              "analyzed": { 
                "type" : "text",
                "analyzer": "french_analyzer"
  
              }
            }
          }
        }
      }
    },
  "settings": {
    "analysis": {
      "analyzer": {
        "path_analyzer": {
          "tokenizer": "path_tokenizer"
        },
        "french_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": ["lowercase", "asciifolding"]
        }
      },
      "tokenizer": {
        "path_tokenizer": {
          "type": "path_hierarchy",
          "delimiter": "/"
        }
      },
      "normalizer": {
        "lowercase_normalizer": {
          "type": "custom",
          "char_filter": [],
          "filter": ["lowercase"]
        }
      }
    }
  }
}

- Insert test data:

POST my-index-00001/test
{
  "name1" : "WT1"
}

POST my-index-00001/test
{
  "name1" : "testWT1"
}

POST my-index-00001/test
{
  "name1" : "WT1test"
}

- Make the search:

GET my-index-00001/test/_search
{
  "query": {
    "query_string": {
      "query": "WT\\:*",
      "fields": ["name1.analyzed"],
      "default_operator": "AND",
      "analyze_wildcard": true
    }
  }
}

As expected, this search returns both results ["name1": "WT:1test", "name1": "WT:1"]

but the issue is with a prefix wildcard as follow:

    GET my-index-00001/test/_search
    {
      "query": {
        "query_string": {
          "query": "*WT\\:", 
          "fields": ["name1.analyzed"],
          "default_operator": "AND",
          "analyze_wildcard": true
        }
      }
    }

same issue with query "WT\:".
this search does not return any result.

Expected result: documents with ["name1": "WT:1test", "name1": "WT:1", "name1": "testWT:1"]

dadoonet · November 3, 2020, 3:51pm

I updated your demo script to 7.9.3. 5.x is not supported anymore.
I also edited the example to fix it with the right values and simplify a bit the mapping to remove non needed fields.

DELETE my-index-00001
PUT my-index-00001
{
  "mappings": {
    "properties": {
      "name1": {
        "type": "text",
        "analyzer": "french_analyzer"
      }
    }
  },
  "settings": {
    "analysis": {
      "analyzer": {
        "french_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "asciifolding"
          ]
        }
      }
    }
  }
}
POST my-index-00001/_doc
{
  "name1" : "WT:1"
}

POST my-index-00001/_doc
{
  "name1" : "testWT:1"
}

POST my-index-00001/_doc
{
  "name1" : "WT:1test"
}

GET my-index-00001/_search
{
  "query": {
    "query_string": {
      "query": "WT\\:*",
      "fields": ["name1"],
      "default_operator": "AND",
      "analyze_wildcard": true
    }
  }
}

GET my-index-00001/_search
{
  "query": {
    "query_string": {
      "query": "*WT\\:",
      "fields": ["name1"],
      "default_operator": "AND",
      "analyze_wildcard": true
    }
  }
}

Now, back to your question. Here is how your documents are indexed behind the scene:

POST my-index-00001/_analyze
{
  "field": "name1",
  "text": ["WT:1", "testWT:1", "WT:1test"]
}

It gives:

{
  "tokens" : [
    {
      "token" : "wt",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "1",
      "start_offset" : 3,
      "end_offset" : 4,
      "type" : "<NUM>",
      "position" : 1
    },
    {
      "token" : "testwt",
      "start_offset" : 5,
      "end_offset" : 11,
      "type" : "<ALPHANUM>",
      "position" : 102
    },
    {
      "token" : "1",
      "start_offset" : 12,
      "end_offset" : 13,
      "type" : "<NUM>",
      "position" : 103
    },
    {
      "token" : "wt",
      "start_offset" : 14,
      "end_offset" : 16,
      "type" : "<ALPHANUM>",
      "position" : 204
    },
    {
      "token" : "1test",
      "start_offset" : 17,
      "end_offset" : 22,
      "type" : "<ALPHANUM>",
      "position" : 205
    }
  ]
}

That's probably not what you want. Instead you should use a keyword data type when searching with wildcards. Which I don't recommend anyway as per documentation:

Avoid beginning patterns with * or ? . This can increase the iterations needed to find matching terms and slow search performance.

But, here the problem is something else. You should use a query like:

GET my-index-00001/_search
{
  "query": {
    "match": {
      "name1": "WT\\:*"
    }
  }
}
GET my-index-00001/_search
{
  "query": {
    "match": {
      "name1": "*WT\\:"
    }
  }
}

Hope this helps.

Anas1 · November 4, 2020, 9:38am

Thank you for your response, but i think that the issue i encounter is about a prefix wildcard as in

GET my-index-00001/_search
{
  "query": {
    "match": {
      "name1": "*WT*"
    }
  }
}

I expected to find with this search term those results : ["WT:1", "testWT:1", "WT:1test"]
but actually only two of them are returned: ["WT:1", "WT:1test"] and not "testWT:1",
the prefix wildcard must not return all three results?!

Thank you in advance

dadoonet · November 4, 2020, 10:15am

Try this one:

GET my-index-00001/_search
{
  "query": {
    "wildcard": {
      "name1": "*WT*"
    }
  }
}

But again, keep in mind that this is not efficient if you are looking for fast response times.

system · December 2, 2020, 10:15am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Special Characters in Query String Elasticsearch	2	2096	April 23, 2020
Elasticsearch/Kibana query_string with special characters Elasticsearch	14	1844	April 13, 2018
"query_string" dosen't analyze wildcard queries Elasticsearch	5	4847	December 28, 2017
Searching word with special characters Elasticsearch	7	1823	November 4, 2020
Wildcards with query_string query and custom analyzer Elasticsearch	2	263	May 25, 2022

"query_string" Wildcard search with special characters issue

Related topics