hi,
I have a requirement to include special characters in search. So for both intake and search, I am trying to create a custom analyzer so that I can retain original character as is. I have set the tokenizer to whitespace and generate_number_parts to false. But when I check the analyzer output, its not working as expected.
PUT /testing
{
  "settings": {
    "index": {
      "analysis": {
        "filter": {
          "word_delimiter_v3_filter": {
            "type": "word_delimiter",
            "generate_number_parts ": false,
            "split_on_numerics": false,
            "preserve_original": true
          }
        },
        "analyzer": {
          "searchword_v3_analyzer": {
            "filter": [
              "lowercase",
              "word_delimiter_v3_filter"
            ],
            "type": "custom",
            "tokenizer": "whitespace"
          }
        }
      }
    }
  },
  "mappings": {
    "testmap": {
      "properties": {
        "fullname": {
          "type": "text",
          "analyzer": "searchword_v3_analyzer",
          "search_analyzer": "searchword_v3_analyzer"
        }
      }
    }
  }
}
Using above analyzer, if I enter "2-10" as search text, I want it to be searched as is.
GET testing/_analyze 
{
  "analyzer": "searchword_v3_analyzer", 
  "text":     "2-10"
}
But when I check the analyzer output, its still splitting 2 and 10 as separate words, even when I am using white space tokenizer and have set generate_number_parts to false and split_on_numerics to false.
I am running v5.5.1 of ES on Windows 2012.
{
  "tokens": [
    {
      "token": "2-10",
      "start_offset": 0,
      "end_offset": 4,
      "type": "word",
      "position": 0
    },
    {
      "token": "2",
      "start_offset": 0,
      "end_offset": 1,
      "type": "word",
      "position": 0
    },
    {
      "token": "10",
      "start_offset": 2,
      "end_offset": 4,
      "type": "word",
      "position": 1
    }
  ]
}
Thanks
askids