Distinct collect first letters from field

Alexey_Melnikov · October 27, 2017, 6:33am

Good day.

I want to create navigation in index by first letters like in vocabulary.

Ideally I need aggregation which will provide to me a set of all first letters from one field with first document id related to each letter.
Otherwise will be enough just a set of first letters collected from one field.

But I cannot find aggregation which can do this for me.

Any suggestions will be helpful. Thanks.

Igor_Motov · October 28, 2017, 1:32am

You can extract the first letter using script in terms aggregation, but I think the most effective way would be to actually index this first letter as a separate field, using a normalizer. Here is a couple of examples to get you started.

DELETE test

PUT test
{
  "settings": {
    "analysis": {
      "char_filter": {
        "first_letter_filter": {
          "type": "pattern_replace",
          "pattern": "(.).*",
          "replacement": "$1"
        }
      },
      "normalizer": {
        "first_letter_normilizer": {
          "type": "custom",
          "char_filter": ["first_letter_filter"],
          "filter": [
            "uppercase"
          ]
        }
      }
    }
  },
  "mappings": {
    "doc": {
      "properties": {
        "berry": {
          "type": "keyword",
          "fields": {
            "first": {
              "type": "keyword",
              "normalizer": "first_letter_normilizer"
            }
          }
        }
      }
    }
  }
}

PUT test/doc/_bulk?refresh=true
{"index": {"_id": "1"}}
{"berry": "Blackcurrant"}
{"index": {"_id": "2"}}
{"berry": "Chili pepper"}
{"index": {"_id": "3"}}
{"berry": "Cranberry"}
{"index": {"_id": "4"}}
{"berry": "Eggplant"}
{"index": {"_id": "5"}}
{"berry": "Gooseberry"}
{"index": {"_id": "6"}}
{"berry": "Grape"}
{"index": {"_id": "8"}}
{"berry": "Guava"}
{"index": {"_id": "9"}}
{"berry": "Kiwifruit"}
{"index": {"_id": "10"}}
{"berry": "Lucuma"}
{"index": {"_id": "11"}}
{"berry": "Pomegranate"}
{"index": {"_id": "12"}}
{"berry": "Redcurrant"}
{"index": {"_id": "13"}}
{"berry": "Tomato"}


GET test/doc/_search
{
  "size": 0,
  "aggs": {
    "alphabet": {
      "terms": {
        "field": "berry.first",
        "size": 26,
        "order": {
          "_key": "asc"
        }
      }
    }
  }
}

GET test/doc/_search
{
  "size": 0,
  "aggs": {
    "alphabet": {
      "terms": {
        "field": "berry",
        "size": 26,
        "script" : {
            "source" : "_value.substring(0,1).toUpperCase()",
            "lang" : "painless"
        },
        "order": {
          "_key": "asc"
        }
      }
    }
  }
}

Alexey_Melnikov · October 29, 2017, 9:08am

Thanks Igor.

I think it's doable for me.

But I have another question: may be I can store that values in another type which will hold for example only 26 records? And on each index update I can intercept new values and update only one record in that type. In that case I don't need every time to search through all index to find 26 letters.

Of course it can be done outside of ES, but it will significantly increase the time of processing data, because I need first insert to index new value, wait for index update, execute search and find new first record for that letter (I need elasticsearch sorting because I'm trying to move all collation logic to ES from application, and sorting depends on settings of icu_collation_keyword).

Igor_Motov · October 29, 2017, 1:04pm

Without support for transactions, implementing something like this in a reliable fashion would be quite difficult.

system · November 26, 2017, 1:05pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to make an aggregation for 1st character at the right way Elasticsearch	5	2889	May 23, 2019
Facetting by first letter Elasticsearch	3	983	July 6, 2017
Aggregation by the first character of field Elasticsearch	6	1514	August 6, 2020
How to get Alphabetic Facet (A-Z) on name field in elasticsearch Elasticsearch	5	1759	July 6, 2017
Retrieve Search Results Based on the First Letter in a Specific Field Elasticsearch	30	8997	July 29, 2019

Distinct collect first letters from field

Related topics