Match with type phrase_prefix doesn't work when words mix numbers and letters

I'm trying to query as such:

{
   "match": {
      "query": "100",
      "fields": "description",
      "type": "phrase_prefix"
   }
}

However, if I search for "100" wanting to find, for example, a document with the "description" as "pipe 100mm", it doesn't bring me this document as a result, it only matches "100" as a prefix for other number (i.e. 10056) or as a full word.

I am using the Javascript client. Description is a field mapped as text.

What are the limitations of phrase_prefix on this case, and what could I change in the query so I can get the results as wanted?

My field mapping is

{
   "description": {
      "type": "text",
      "fields": {
         "keyword": {
            "type": "keyword"
         }
      }
   },
   "analyzer": "ac_filter"
}

The analyzer indicated is built as

"analysis": {
	"analyzer": {
		"ac_filter": {
			"filter": [
				"lowercase",
				"asciifolding"
			],
			"char_filter": [
				"alphabets_char_filter"
			],
			"type": "custom",
			"tokenizer": "standard"
			}
		},
		"char_filter": {
			"alphabets_char_filter": {
				"pattern": "[^\\p{L}\\p{Nd}]",
				"type": "pattern_replace",
				"replacement": " "
			}
		}
}

I haven't tried to use ngrams or search_as_you_type yet as I wanted to understand why the above does not work. If there isn't a workaround, let me know.

Thanks.

Hi @Raphael_Fidelis

I believe there is no problem. Look at the tokens generated for the search "pipe 100mm"

GET idx_test/_analyze
{
  "analyzer": "ac_filter", 
  "text": ["tube 100mm"]
}

There are no matches for the term 100, using the match query, because its token is "100mm".

I believe that if you use the match_phrase_prefix or wildcard query it will return the documents that start with 100.

1 Like

I tried match_phrase_prefix, but with no luck, maybe there's a detail that changes it's behaviour, since I've got a pretty big index and mapping. I was wondering why phrase_prefix is working fine for queries like "smart" returning documents such as "smartphone" or "1520" returning documents such as "15203289", but "50" isn't recognized as a prefix for "50cm".

I informed the settings to simulate searches. In all tests I got a result (tests with smart, 100, 1502 and 50).
Below are the test codes. You say that the match_phrase_prefix did not work, maybe there is some information missing from your question.

PUT /idx_test
{
  "mappings": {
    "properties": {
      "description": {
        "type": "text",
        "analyzer": "ac_filter"
      }
    }
  },
  "settings": {
    "analysis": {
      "analyzer": {
        "ac_filter": {
          "filter": [
            "lowercase",
            "asciifolding"
          ],
          "char_filter": [
            "alphabets_char_filter"
          ],
          "type": "custom",
          "tokenizer": "standard"
        }
      },
      "char_filter": {
        "alphabets_char_filter": {
          "pattern": """[^\p{L}\p{Nd}]""",
          "type": "pattern_replace",
          "replacement": " "
        }
      }
    }
  }
}

POST idx_test/_bulk
{"index":{}}
{"description": "10056"}
{"index":{}}
{"description": "pipe 100mm"}
{"index":{}}
{"description": "smartphone"}
{"index":{}}
{"description": "15203289"}
{"index":{}}
{"description": "50cm"}

POST idx_test/_search
{
  "query": {
    "match_phrase_prefix": {
      "description": "50"
    }
  }
}

Ouput:

 "hits" : [
      {
        "_index" : "idx_test",
        "_type" : "_doc",
        "_id" : "OYZjKocBev_z0rRh2Tkm",
        "_score" : 0.2876821,
        "_source" : {
          "description" : "50cm"
        }
      }
    ]

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.