Elasticsearch Sub string on a Field

Hi,

I have a field with MSISDNs. I need to do aggs by Substring of First 4 digits of MSISDN.

How can this be implemented in best optimised way. My Data is huge as 100GB/hr.

Can this be done by Scripted field or Inline Scripting? I'm not able to achieve this.

My Elasticsearch version is 5. We have recently moved our live instances from Elasticsearch 2.4 to 5.

Sorry if repeated.. Thanks in Advance.

Hey,

even though you could do this using a script in the terms aggregation - it would make more sense (especially when you want fast results, as the scripting solution it doing things at search time which should be done at index time), to index the first four digits of your field into a separate field.

There are a few different ways to do this during indexing by using different token filters. This example might not be the best, but works in this test...

PUT my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "keyword",
          "filter": [
            "length_filter"
          ]
        }
      },
      "filter": {
        "length_filter": { 
          "type": "truncate",
          "length" : 4
        }
      }
    }
  }
}

POST my_index/_analyze
{
  "analyzer": "my_analyzer",
  "text": "1234567890"
}

--Alex

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.