Wildcard search with _ (underscore) is giving no result


(R01K) #1

Hi Team,

I facing the issue while using wild card search with query containing understore (_) in it with elasticsearch version 6.0.1
And i'm using word_delimiter_filter for the field my_field as mentioned in below query

I have document with my_field = document_02.txt and i want to search for this document using _02* as mentioned in below query but this will give me zero result

GET my_index/_search
{
 
 "query": {
   "bool": { 
     "must": [
       
       {
         "query_string": {
           
           "query":"my_field:_02*"
         }
       }
     ]
   }
 }
}

Below query gives me the valid search result if wildcard is not used in the query

  GET my_index/_search
    {
     
     "query": {
       "bool": { 
         "must": [
           
           {
             "query_string": {
               
               "query":"my_field:_02"
             }
           }
         ]
       }
     }
    }

kindly help!


(Val Crettaz) #2

Can you show the mapping of your field and the definition of the analyzer you're using for that field?


(R01K) #3

Here are the steps to replicate the issue

PUT my_index
{
    "settings": {
      "index": {
        "analysis": {
          "analyzer": {
            "custom_analyzer": {
              "filter": [
                "word_delimiter",
                "lowercase"
              ],
              "type": "custom",
              "tokenizer": "standard"
            }
          }
        }
      }
    },
    "mappings": {
      "doc": {
        "properties": {
          "my_field": {
            "type": "text",
            "analyzer": "custom_analyzer",
            "fields": {
              "keyword": {
                "ignore_above": 256,
                "type": "keyword"
              }
            }
          }
        }
      }
    }
  }
  
  

  
PUT my_index/doc/1 
{
  "my_field":"document_02.txt"
}

GET my_index/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "query_string": {
            "default_field": "my_field",
            "query": "_02*"
          }
        }
      ]
    }
  }
}

(Val Crettaz) #4

I'm not sure why you're using a word_delimiter token filter. When analyzing document_02.txt, it's going to produce the following tokens (obtained from the _analyze endpoint):

{
  "tokens": [
    {
      "token": "document",
      "start_offset": 0,
      "end_offset": 8,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "02",
      "start_offset": 9,
      "end_offset": 11,
      "type": "<ALPHANUM>",
      "position": 1
    },
    {
      "token": "txt",
      "start_offset": 12,
      "end_offset": 15,
      "type": "<ALPHANUM>",
      "position": 2
    }
  ]
}

If you want to search inside words, what you need to to leverage the ngram token filter


(R01K) #5

I'm using word_delimiter filter as i want to create token based on only words and numbers in it and not on any special character.

Just wanted to know any fix for this issue without changing the filter to ngram
or any workaround to fix the same?

Kindly help


(Val Crettaz) #6

Ok I understand. Since document_02.txt gets tokenized and indexed as the three tokens document, 02 and txt, you can search for 02* instead of _02* since the underscore is discarded during the analysis process.


(R01K) #7

Sure , thanks @val !!


(system) #8

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.