ElasticSearch 5.3 filterer char_filter. pattern_replace not working


(Galim Kaudinov) #1

favorite
I have a requirement where I need to query docs by phone number. Users can enter characters such as parenthesis and dashes in the search query string and they should be ignored.So, I have created a custom analyzer that uses a char_filter which in its turn uses pattern_replace token filter to remove everything but digits with a regex. But It does not seem like elastic search is filtering out non-digits. Here is a sample of what I am trying to do:

Index Creation

put my_test_index 
{
     "settings" : {
         "index": {
            "analysis": {
               "char_filter": {
                  "non_digit": {
                     "pattern": "\\D",
                     "type": "pattern_replace",
                     "replacement": ""
                  }
               },
               "analyzer": {
                  "no_digits_analyzer": {
                     "type": "custom",
                     "cahr_filter": [
                        "non_digit"
                     ],
                     "tokenizer": "keyword"
                  }
            }
        }
     }
   },
   "mappings" : {
       "doc_with_phone_prop" : {
           "properties": {
               "phone": {
                   "type": "text",
                   "analyzer": "no_digits_analyzer",
                   "search_analyzer": "no_digits_analyzer"
               }
           }
       }
   }
}

Inserting one doc

put my_test_index/doc_with_phone_prop/1
{
    "phone": "3035555555"
}

Querying without any parenthesis or dashes in the phone

post my_test_index/doc_with_phone_prop/_search
{
    "query": {
        "bool": {
            "must": [
            {
                "query_string": {
                    "query": "3035555555",
                    "fields": ["phone"]
                }
            }]
        }
    }
}

This returns one document correctly:

{
   "took": 1,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 0.2876821,
      "hits": [
         {
            "_index": "my_test_index",
            "_type": "doc_with_phone_prop",
            "_id": "1",
            "_score": 0.2876821,
            "_source": {
               "phone": "3035555555"
            }
         }
      ]
   }
}

Querying with parenthesis does not return anything, But I was under the assumption that my no_digits_analyzer will remove from the search terms everything but digits.

post my_test_index/doc_with_phone_prop/_search
{
    "query": {
        "bool": {
            "must": [
            {
                "query_string": {
                    "query": "\\(303\\)5555555",
                    "fields": ["phone"]
                }
            }]
        }
    }
}

What am I doing wrong here?

I am using ElasticSearch 5.3.

Thanks.


(Medcl) #2

Hey, there is a small typo: cahr_filter,should fixed with: char_filter


(Galim Kaudinov) #3

Yep! fixed it. Thanks, this was pretty stupid.

But now for whatever reason when I query \(303\)5555555 returns a correct row and 3035555555, but If I put dash inside it does not return results, like this \(303\)555-5555. Regex seems to be correct.


(Medcl) #4

if you already filtered the char during analysis chain, you can easily use the field directly, regex is not needed, just use simple query string query.


(Galim Kaudinov) #5

I was using query_string in the must and that was incorrect, because it does not escape special characters.

I needed to use multi_match

https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-multi-match-query.html

post my_test_index/doc_with_phone_prop/_search
{
    "query": {
		"bool": {
			"must": [
			{
				"multi_match": {
					"query": "(303) 555- 5555",
					"fields": ["phone"]
				}
			}]
		}
	}
}

(system) #6

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.