Handling Punctuation in multi_match query

I have a query where I search for a product name field. My products might contain punctuation and my queries might contain the punctuation as well. Given the following product title "CenterG 5.3 Drive Belt for model number 4421" and the query "Centerg 5.3 Drive Belt" I can obtain the results I would expect.

However, if the query contains no punctuation, the "CenterG 5.3 Drive Belt for model number 4421" product does not show up in the results. Instead other less relevant products that simply have "53" in the title render first.

I have tried both the english and standard analyzers but I believe I need to create my own analyzer and tokenizer but I am unsure what configuration settings I should use. The english analyzer works best so far with my data-set the only issue is the punctuation.

Here is my index:

{:properties=>{:name=>{:type=>"text", :analyzer=>"english"}}

And my query:

 {
        query: {
            bool: {
                should: 
                   {
                        multi_match:{
                            fields: ["name"],
                            query: "#{query}"
                        }
                    }
             }
          }

    }

You could create a custom analyzer that uses the mapping character filter to remove all characters you would like to ignore, like the . character.

For example, if you created your index like this:

PUT my_index
{
  "settings": {
    "analysis": {
      "char_filter": {
        "my_char_filter": {
          "type": "mapping",
          "mappings": [
            ". =>"
          ]
        }
      },
      "analyzer": {
        "my_analyzer": {
          "char_filter": [
            "my_char_filter"
          ],
          "tokenizer": "standard",
          "filter": [
            "lowercase"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "name": {
        "type": "text",
        "analyzer": "my_analyzer"
      }
    }
  }
}

You could then query for 5.3 or 53 and get the exact same document containing "CenterG 5.3 Drive Belt for model number 4421":

PUT my_index/_doc/1
{
  "name": "CenterG 5.3 Drive Belt for model number 4421"
}

GET my_index/_search
{
  "query": {
    "match": {
      "name": "5.3"
    }
  }
}


GET my_index/_search
{
  "query": {
    "match": {
      "name": "53"
    }
  }
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.