How to search for terms containing hyphen (-) on _all field?

Hello,

I have an issue searching a term containing hyphens on the _all field. I have also tried with a custom _all field but I have the same results.

I have configured my index like this, using a filter with a "type" : "word_delimiter" and "preserve_original": "true" .

PUT my_index
POST my_index/_close
PUT my_index/_settings
{
   "index": {
      "analysis": {
      	"filter" : {
            "my_word_delimiter" : {
                "type" : "word_delimiter",
                "preserve_original": "true"
            }   
        },
         "analyzer": {
            "my_analyzer": {
               "type": "custom",
               "tokenizer": "standard",
               "filter": [
                  "standard",
                  "lowercase",
                  "stop",
                  "my_word_delimiter"
               ]
            }
         }
   }
 }
}

POST my_index/_open

PUT my_index/my_item/_mapping
{
   "my_item": {
  "_all": {
     "analyzer": "my_analyzer"
  },
  "properties": {
     "name": {
        "type": "keyword"
     },
     "title": {
        "type": "keyword"
     },
     "identifier": {
        "type": "keyword"
     }
  }
   }
}

Now if I put items in my index like this :

PUT my_index/my_item/001
{
    "name" : "doc_num_01234",
    "title" : "doc-num-01234"
}

And then, I try to search on it, if I search for doc_num_01234 in _all, it works fine.
But if I search for doc-num-01234 I do not have results.

This :

POST my_index/_search
{
   "query": {
      "term": {
         "_all": "doc-num-01234"
      }
   }
}

returns nothing.

What can I do to retrieve the proper values ?

Your analyzer isn't doing what you think it is, it's removing the -, See:

DELETE my_index

PUT my_index
{
  "index": {
    "analysis": {
      "filter" : {
        "my_word_delimiter" : {
          "type" : "word_delimiter",
          "preserve_original": "true"
        }   
      },
      "analyzer": {
        "my_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "standard",
            "lowercase",
            "stop",
            "my_word_delimiter"
          ]
        }
      }
    }
  }
}

POST /my_index/_analyze
{
  "field": "_all",
  "text": "doc-num-01234"
}

POST /my_index/_analyze
{
  "tokenizer": "standard",
  "text": "doc-num-01234"
}

Thank you for your answer.

Changing my tokenizer to whitespace solved my issue :+1:

PUT my_index
POST my_index/_close
PUT my_index/_settings
{
   "index": {
      "analysis": {
          "filter" : {
            "my_word_delimiter" : {
                "type" : "word_delimiter",
                "preserve_original": "true"
            }   
        },
         "analyzer": {
            "my_analyzer": {
               "type": "custom",
               "tokenizer": "whitespace",
               "filter": [
                  "standard",
                  "lowercase",
                  "stop",
                  "my_word_delimiter"
               ]
            }
         }
   }
 }
}

POST my_index/_open

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.