Indexing and searching strings with periods (".")


#1

How can I set up indexing and searching to disregard periods when matching strings?

For example, if I search for USA I want USA and U.S.A to be returned. Likewise, U.S.A. should return both U.S.A. and USA.

I know I can use synonyms for this but I'd prefer not to have to specify the synonyms ahead of time.

Thanks!


(Moises Garcia Marquez) #2

You could use a char filter and map the dot to be analyzed as an empty string, and use this char filter for both your index and search analyzer.


(Moises Garcia Marquez) #3

Another solution would be to use the word delimiter token filter and catenate everything or only words, so that both U.S.A. and USA are valid tokens appearing on your index.


(Imma) #4

You could try defyning a mapping char filter analyzer (https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-mapping-charfilter.html) or a pattern replace char filter analyzer: https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-pattern-replace-charfilter.html.

PUT my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "standard",
          "char_filter": [
            "my_char_filter"
          ]
        }
      },
      "char_filter": {
        "my_char_filter": {
          "type": "pattern_replace",
          "pattern": "\\.",
          "replacement": ""
        }
      }
    }
  }
}

POST my_index/_analyze
{
  "analyzer": "my_analyzer",
  "text": "u.s.a."
}

#5

Thank you. The word delimiter token filter does the job!


(system) #6

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.