Elasticsearch Sub string on a Field

jaliph · December 29, 2016, 3:12pm

Hi,

I have a field with MSISDNs. I need to do aggs by Substring of First 4 digits of MSISDN.

How can this be implemented in best optimised way. My Data is huge as 100GB/hr.

Can this be done by Scripted field or Inline Scripting? I'm not able to achieve this.

My Elasticsearch version is 5. We have recently moved our live instances from Elasticsearch 2.4 to 5.

Sorry if repeated.. Thanks in Advance.

spinscale · December 30, 2016, 8:49am

Hey,

even though you could do this using a script in the terms aggregation - it would make more sense (especially when you want fast results, as the scripting solution it doing things at search time which should be done at index time), to index the first four digits of your field into a separate field.

There are a few different ways to do this during indexing by using different token filters. This example might not be the best, but works in this test...

PUT my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "keyword",
          "filter": [
            "length_filter"
          ]
        }
      },
      "filter": {
        "length_filter": { 
          "type": "truncate",
          "length" : 4
        }
      }
    }
  }
}

POST my_index/_analyze
{
  "analyzer": "my_analyzer",
  "text": "1234567890"
}

--Alex

system · January 27, 2017, 8:49am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Aggregation on substring of a specific field Elasticsearch	5	1368	May 6, 2021
Group by on Derived string field Elasticsearch	5	3262	September 1, 2017
Can I aggregate using only part of a field? Elasticsearch	3	541	April 6, 2019
Grouping with Elasticsearch (aggs) to join a field into a list of values Elasticsearch	2	461	June 7, 2018
How to aggregate over 'ip' field using script Elasticsearch	1	446	October 15, 2018

Elasticsearch Sub string on a Field

Related topics