Is there any possible solution for group by, order by, and pagination?

Jenny1 · April 7, 2021, 6:38am

Hi, I got stuck on the problem with elasticsearch, and appreciated it if someone could help me.

I have 2 indices, comment_index, and like_index.

comment_index: user comment data
like_index: keeps track of which user likes which comment
{'user_id': '1', 'commend_id': 1},{'user_id': '1', 'commend_id': 2},{'user_id': '2', 'commend_id': 1}, ...

mappings (like_index)

"mappings": {
"properties": {
	"comment_id": {
	  "type": "keyword"
	},
	"freq_num": {
	  "type": "integer" --> copied from comment_index, since the result needs to be ordered by this field. 
	},
	"user_id": {
	  "type": "keyword"
	}
	....
}

}

And below is what I need to do.
query like_index, and the result needs to be

group by comment_id (to get how many likes each comment has)
order by comment_id count (count of each comment_id, that is, the number of like counts of each comments) and order by an additional field(freq_num)
if it cannot be possible, then order only with 'doc_count' is also fine.
scroll or pagination (size=10)

I've tried several ideas but couldn't find the right way to solve this problem.

The actual scripts I've made are more complicated, but for those who searching for similar issues, I simplified the scripts.

Anyways, I've tried 3 ways to solve the problem as below.

#1. aggregation composite
It's possible group by and pagination(after key) but can't order by doc_count.

 {
"query": { "match_all": {} },
"aggs": {
	"like_counts": {
		"composite": {
			"sources": [
				{"freq_num": {"terms": {"field": "freq_num", "order": "desc"}}},
				{"comment_id": {"terms": {"field": "comment_id"}}}
			],
			"size": 10
		}
	}
}

what I expected:

 "buckets": [
    {
      "key": {
        "freq_num": 136,
        "comment_id": "JcAYqngBhI5vW3tODWnX"
      },
      "doc_count": 2
    },
    {
      "key": {
        "freq_num": 446,
        "comment_id": "IsAYqngBhI5vW3tODWm7"
      },
      "doc_count": 1
    },
    {
      "key": {
        "freq_num": 136,
        "comment_id": "I8AYqngBhI5vW3tODWnE"
      },
      "doc_count": 1
    },
	....

what I've got:

 "buckets": [
    {
      "key": {
        "freq_num": 446,
        "comment_id": "IsAYqngBhI5vW3tODWm7"
      },
      "doc_count": 1
    },
    {
      "key": {
        "freq_num": 136,
        "comment_id": "I8AYqngBhI5vW3tODWnE"
      },
      "doc_count": 1
    },
    {
      "key": {
        "freq_num": 136,
        "comment_id": "JcAYqngBhI5vW3tODWnX"
      },
      "doc_count": 2
    },
	....

#2. aggregation terms
At first, this gives the result in the sort order of the number of docs as I expected.
But If I use the partition option here, this doesn't provide the right result.

{
        "query": { "match_all": {}
        },
        "size": 0,
           "aggs": {
                    "like_counts ": {
                        "terms": {
                            "field": "comment_id",
                            "include": {
                               "partition": 0,
                               "num_partitions": 10
                            },
                            "size": 3,
                            "order": {"freq_num": "desc"}
                        },
                        "aggs": {
                            "freq_num": { "max": { "field": "freq_num" }}
                        }
                    }
                }
            
}

what I expected:

partition: 1

 "buckets": [
  {
    "key": "CMAXqngBhI5vW3tOq18z",
    "doc_count": 1,
    "freq_num": {
      "value": 8000
    }
  },
  
 {
    "key": "r8AXqngBhI5vW3tOxGH7",
    "doc_count": 1,
    "freq_num": {
      "value": 5000
    }
  },
  {
    "key": "tcAXqngBhI5vW3tOxWFA",
    "doc_count": 1,
    "freq_num": {
      "value": 4000
    }
  },

partition: 2

 "buckets": [

  {
    "key": "DsAXqngBhI5vW3tOq19p",
    "doc_count": 1,
    "freq_num": {
      "value": 2000
    }
  }
  {
    "key": "uMAXqngBhI5vW3tOxWFg",
    "doc_count": 1,
    "freq_num": {
      "value": 3000
    }
  },
  {
    "key": "DMAXqngBhI5vW3tOq19X",
    "doc_count": 1,
    "freq_num": {
      "value": 2000
    }
  }
  ]

what I've got:

partition: 1

 "buckets": [
 {
   "key": "CMAXqngBhI5vW3tOq18z",
   "doc_count": 1,
   "freq_num": {
     "value": 8000
   }
 },
 {
   "key": "tcAXqngBhI5vW3tOxWFA",
   "doc_count": 1,
   "freq_num": {
     "value": 4000
   }
 },
 {
   "key": "DsAXqngBhI5vW3tOq19p",
   "doc_count": 1,
   "freq_num": {
     "value": 2000
   }
 }

partition: 2

  "buckets": [
 {
    "key": "r8AXqngBhI5vW3tOxGH7",
    "doc_count": 1,
    "freq_num": {
      "value": 5000
    }
  },
  {
    "key": "uMAXqngBhI5vW3tOxWFg",
    "doc_count": 1,
    "freq_num": {
      "value": 3000
    }
  },
  {
    "key": "DMAXqngBhI5vW3tOq19X",
    "doc_count": 1,
    "freq_num": {
      "value": 2000
    }
  }
  ]

#3. collapse
I've just noticed that I also have the 'collapse' option, but I don't know how to use it here.
As I run the script below, I get a similar result but didn't get the count of each group. (collapse by comment_id)

{
   "query": { "match_all": {} },
   "size": 10,
   "collapse": {
   	"field": "comment_id"
   }
}

I'm considering rather choose #2 and fetch all data without using the partition option and split the data which might be a really bad idea.
Could anyone give me the proper way to achieve this?

Thank you.

system · May 5, 2021, 6:39am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
SORTING, SUM, PAGINATION, and AGGREGATION all in one! Elasticsearch	1	383	January 5, 2022
Can elasticsearch do GROUP BY and ORDER BY count? Elasticsearch	6	44313	December 6, 2016
ES, aggregation and pagination Elasticsearch	5	2016	July 6, 2017
Sort by every element of array Elasticsearch	1	307	February 23, 2021
Equivalent of 'Group By' along with significant_text aggregation Elasticsearch	5	646	October 11, 2021

Is there any possible solution for group by, order by, and pagination?

Related topics