ElasticSearch 5.3 filterer char_filter. pattern_replace not working

favorite
I have a requirement where I need to query docs by phone number. Users can enter characters such as parenthesis and dashes in the search query string and they should be ignored.So, I have created a custom analyzer that uses a char_filter which in its turn uses pattern_replace token filter to remove everything but digits with a regex. But It does not seem like elastic search is filtering out non-digits. Here is a sample of what I am trying to do:

Index Creation

put my_test_index 
{
     "settings" : {
         "index": {
            "analysis": {
               "char_filter": {
                  "non_digit": {
                     "pattern": "\\D",
                     "type": "pattern_replace",
                     "replacement": ""
                  }
               },
               "analyzer": {
                  "no_digits_analyzer": {
                     "type": "custom",
                     "cahr_filter": [
                        "non_digit"
                     ],
                     "tokenizer": "keyword"
                  }
            }
        }
     }
   },
   "mappings" : {
       "doc_with_phone_prop" : {
           "properties": {
               "phone": {
                   "type": "text",
                   "analyzer": "no_digits_analyzer",
                   "search_analyzer": "no_digits_analyzer"
               }
           }
       }
   }
}

Inserting one doc

put my_test_index/doc_with_phone_prop/1
{
    "phone": "3035555555"
}

Querying without any parenthesis or dashes in the phone

post my_test_index/doc_with_phone_prop/_search
{
    "query": {
        "bool": {
            "must": [
            {
                "query_string": {
                    "query": "3035555555",
                    "fields": ["phone"]
                }
            }]
        }
    }
}

This returns one document correctly:

{
   "took": 1,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 0.2876821,
      "hits": [
         {
            "_index": "my_test_index",
            "_type": "doc_with_phone_prop",
            "_id": "1",
            "_score": 0.2876821,
            "_source": {
               "phone": "3035555555"
            }
         }
      ]
   }
}

Querying with parenthesis does not return anything, But I was under the assumption that my no_digits_analyzer will remove from the search terms everything but digits.

post my_test_index/doc_with_phone_prop/_search
{
    "query": {
        "bool": {
            "must": [
            {
                "query_string": {
                    "query": "\\(303\\)5555555",
                    "fields": ["phone"]
                }
            }]
        }
    }
}

What am I doing wrong here?

I am using ElasticSearch 5.3.

Thanks.

Hey, there is a small typo: cahr_filter,should fixed with: char_filter

Yep! fixed it. Thanks, this was pretty stupid.

But now for whatever reason when I query \(303\)5555555 returns a correct row and 3035555555, but If I put dash inside it does not return results, like this \(303\)555-5555. Regex seems to be correct.

if you already filtered the char during analysis chain, you can easily use the field directly, regex is not needed, just use simple query string query.

I was using query_string in the must and that was incorrect, because it does not escape special characters.

I needed to use multi_match

https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-multi-match-query.html

post my_test_index/doc_with_phone_prop/_search
{
    "query": {
		"bool": {
			"must": [
			{
				"multi_match": {
					"query": "(303) 555- 5555",
					"fields": ["phone"]
				}
			}]
		}
	}
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.