Terms aggs and then sort all data

In my case :

  1. I use terms aggs to group my data, and I need to set the size >>> cardinality count (need to group all data for sorting)
  2. I use top-hits and size = 1 so I get the first one in every bucket
  3. bucket-sort to sort the final all buckets

the problem is the data size is about 10 million, so the terms aggs is very slow
then i tryed composite aggs ,but it can not sort all the buckets data

I want to group first ,and then get first one in every bucket ,and finally get the sorting buckets

can u help me with my case ? thanks a lot

Hi,

Can you please provide some additional information to help us answer:

  1. Are you aggregating from multiple fields or from a single field?
  2. Are you working on keyword or on a tokenised field?
  3. Are you looking to sort the buckets by the number of records containing the term or by alphabetic order based on the first record in the bucket?
  4. How many buckets do you expect in the result set (order of magnitude)?

It may also help if you can provide a data sample and the query you use.

Thanks,

Gilad

thx for reply,

  1. only one field I want to use term aggs on
  2. It's a keyword type

about 1 to 2 million data i have to sort after terms aggs

e.g.

POST test_search_index/doc/_bulk
{"index":{"_id":"1"}}
{"name":"Michell","class":"A","age":26,"scoreA":99,"scoreB":100,"scoreC" : 99}
{"index":{"_id":"2"}}
{"name":"Job","class":"B","age":23,"scoreA":98,"scoreB":23,"scoreC" : 23}
{"index":{"_id":"3"}}
{"name":"mata","class":"C","age":21,"scoreA":97,"scoreB":44,"scoreC" : 54}
{"index":{"_id":"4"}}
{"name":"Bob","class":"C","age":20,"scoreA":97,"scoreB":55,"scoreC" : 65}

PUT test_search_index/_mapping/doc
{
"doc": {
"properties": {
"class": {
"type": "text",
"fielddata": true
}
}
}
}

GET test_search_index/_search

GET _cat/indices

A . terms aggs -> group by class , size = all (10000000)
B . top hits -> order by desc scoreA,scoreB,scoreC
C . bucket sort -> top hits and then order by desc scoreA,scoreB

and here is my query :

GET test_search_index/_search
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"class": {
"terms": {
"field": "class",
"size": 5000000
},
"aggs": {
"SortGroup": {
"bucket_sort": {
"sort": [
{
"maxScoreA": "desc"
},
{
"maxScoreB": "desc"
}
],
"from": 0,
"size": 60
}
},
"maxScoreA": {
"max": {
"field": "scoreA"
}
},
"maxScoreB": {
"max": {
"field": "scoreB"
}
},
"SortWithinGroup": {
"top_hits": {
"sort": [
{
"scoreA": "desc"
},
{
"scoreB": "desc"
},
{
"scoreC": "desc"
}
],
"_source": {
"includes": [
"class",
"name",
"scoreA",
"scoreB",
"scoreC"
]
},
"size": 1
}
}
}
},
"count": {
"cardinality": {
"field": "goodsSn",
"precision_threshold": "40000"
}
}
}
}

Hi,

Thank you for providing these clarifications. Why not sort using the Order within Composite Aggregation? See: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-composite-aggregation.html#_order . Please clarify if I'm misunderstanding the need.

Best regards,

Gilad

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.