"query_string" Wildcard search with special characters issue

When searching using a wildcard words, i have an unexpected behavior.
I'm working on ES 5.6.8.

To reproduce the issue:

(Test with Kibana)

- create the index :

    PUT my-index-00001
{
  "mappings": {
      "test": {
        "properties": {
          "name1": {
            "type": "keyword",
            "fields": {
              "analyzed": { 
                "type" : "text",
                "analyzer": "french_analyzer"
  
              }
            }
          }
        }
      }
    },
  "settings": {
    "analysis": {
      "analyzer": {
        "path_analyzer": {
          "tokenizer": "path_tokenizer"
        },
        "french_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": ["lowercase", "asciifolding"]
        }
      },
      "tokenizer": {
        "path_tokenizer": {
          "type": "path_hierarchy",
          "delimiter": "/"
        }
      },
      "normalizer": {
        "lowercase_normalizer": {
          "type": "custom",
          "char_filter": [],
          "filter": ["lowercase"]
        }
      }
    }
  }
}

- Insert test data:

POST my-index-00001/test
{
  "name1" : "WT1"
}

POST my-index-00001/test
{
  "name1" : "testWT1"
}

POST my-index-00001/test
{
  "name1" : "WT1test"
}

- Make the search:

GET my-index-00001/test/_search
{
  "query": {
    "query_string": {
      "query": "WT\\:*",
      "fields": ["name1.analyzed"],
      "default_operator": "AND",
      "analyze_wildcard": true
    }
  }
}

As expected, this search returns both results ["name1": "WT:1test", "name1": "WT:1"]

but the issue is with a prefix wildcard as follow:

    GET my-index-00001/test/_search
    {
      "query": {
        "query_string": {
          "query": "*WT\\:", 
          "fields": ["name1.analyzed"],
          "default_operator": "AND",
          "analyze_wildcard": true
        }
      }
    }

same issue with query "WT\:".
this search does not return any result.

Expected result: documents with ["name1": "WT:1test", "name1": "WT:1", "name1": "testWT:1"]

I updated your demo script to 7.9.3. 5.x is not supported anymore.
I also edited the example to fix it with the right values and simplify a bit the mapping to remove non needed fields.

DELETE my-index-00001
PUT my-index-00001
{
  "mappings": {
    "properties": {
      "name1": {
        "type": "text",
        "analyzer": "french_analyzer"
      }
    }
  },
  "settings": {
    "analysis": {
      "analyzer": {
        "french_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "asciifolding"
          ]
        }
      }
    }
  }
}
POST my-index-00001/_doc
{
  "name1" : "WT:1"
}

POST my-index-00001/_doc
{
  "name1" : "testWT:1"
}

POST my-index-00001/_doc
{
  "name1" : "WT:1test"
}

GET my-index-00001/_search
{
  "query": {
    "query_string": {
      "query": "WT\\:*",
      "fields": ["name1"],
      "default_operator": "AND",
      "analyze_wildcard": true
    }
  }
}

GET my-index-00001/_search
{
  "query": {
    "query_string": {
      "query": "*WT\\:",
      "fields": ["name1"],
      "default_operator": "AND",
      "analyze_wildcard": true
    }
  }
}

Now, back to your question. Here is how your documents are indexed behind the scene:

POST my-index-00001/_analyze
{
  "field": "name1",
  "text": ["WT:1", "testWT:1", "WT:1test"]
}

It gives:

{
  "tokens" : [
    {
      "token" : "wt",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "1",
      "start_offset" : 3,
      "end_offset" : 4,
      "type" : "<NUM>",
      "position" : 1
    },
    {
      "token" : "testwt",
      "start_offset" : 5,
      "end_offset" : 11,
      "type" : "<ALPHANUM>",
      "position" : 102
    },
    {
      "token" : "1",
      "start_offset" : 12,
      "end_offset" : 13,
      "type" : "<NUM>",
      "position" : 103
    },
    {
      "token" : "wt",
      "start_offset" : 14,
      "end_offset" : 16,
      "type" : "<ALPHANUM>",
      "position" : 204
    },
    {
      "token" : "1test",
      "start_offset" : 17,
      "end_offset" : 22,
      "type" : "<ALPHANUM>",
      "position" : 205
    }
  ]
}

That's probably not what you want. Instead you should use a keyword data type when searching with wildcards. Which I don't recommend anyway as per documentation:

Avoid beginning patterns with * or ? . This can increase the iterations needed to find matching terms and slow search performance.

But, here the problem is something else. You should use a query like:

GET my-index-00001/_search
{
  "query": {
    "match": {
      "name1": "WT\\:*"
    }
  }
}
GET my-index-00001/_search
{
  "query": {
    "match": {
      "name1": "*WT\\:"
    }
  }
}

Hope this helps.

Thank you for your response, but i think that the issue i encounter is about a prefix wildcard as in

GET my-index-00001/_search
{
  "query": {
    "match": {
      "name1": "*WT*"
    }
  }
}

I expected to find with this search term those results : ["WT:1", "testWT:1", "WT:1test"]
but actually only two of them are returned: ["WT:1", "WT:1test"] and not "testWT:1",
the prefix wildcard must not return all three results?!

Thank you in advance

Try this one:

GET my-index-00001/_search
{
  "query": {
    "wildcard": {
      "name1": "*WT*"
    }
  }
}

But again, keep in mind that this is not efficient if you are looking for fast response times.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.