Match with or without thousands separator

How do we match numbers in text-fields with and without thousands separators?

Use-case:

  • The user searches for titles with one of "15000", "15,000" or "15.000"
  • They expect to find items with titles like these:
    • "Historien om danskernes mad i 15.000 år" (a danish title)
    • "History for the last 15,000 years".

Is there a change we can make to our mapping that make this use-case possible?

      "DisplayTitle": {
        "fields": {
          "da": {
            "analyzer": "danish",
            "type": "text"
          },
          "de": {
            "analyzer": "german",
            "type": "text"
          },
          "en": {
            "analyzer": "english",
            "type": "text"
          },
          "fr": {
            "analyzer": "french",
            "type": "text"
          },
          "keyword": {
            "ignore_above": 256,
            "type": "keyword"
          }
        },
        "type": "text"
      },

We use ElasticSearch 7.15

Hi Morten,
Adapted from the example in the pattern replace token filter docs:

PUT my-index-00001
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "standard",
          "char_filter": [
            "my_char_filter"
          ]
        }
      },
      "char_filter": {
        "my_char_filter": {
          "type": "pattern_replace",
          "pattern": "(\\d+),(?=\\d)",
          "replacement": "$1"
        }
      }
    }
  }
}

POST my-index-00001/_analyze
{
  "analyzer": "my_analyzer",
  "text": "My score is 12,000, yours is 11000 "
}
2 Likes

Thank you very much.

I made a small change to the pattern:


      "char_filter": {
        "my_char_filter": {
          "type": "pattern_replace",
          "pattern": "(\\d+)\\.(?=\\d{3})",
          "replacement": "$1"
        }
      }
1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.