Is there any possible solution for group by, order by, and pagination?

Hi, I got stuck on the problem with elasticsearch, and appreciated it if someone could help me.

I have 2 indices, comment_index, and like_index.

  • comment_index: user comment data
  • like_index: keeps track of which user likes which comment
    {'user_id': '1', 'commend_id': 1},{'user_id': '1', 'commend_id': 2},{'user_id': '2', 'commend_id': 1}, ...

mappings (like_index)

"mappings": {
"properties": {
	"comment_id": {
	  "type": "keyword"
	},
	"freq_num": {
	  "type": "integer" --> copied from comment_index, since the result needs to be ordered by this field. 
	},
	"user_id": {
	  "type": "keyword"
	}
	....
}

}

And below is what I need to do.
query like_index, and the result needs to be

  1. group by comment_id (to get how many likes each comment has)
  2. order by comment_id count (count of each comment_id, that is, the number of like counts of each comments) and order by an additional field(freq_num)
    if it cannot be possible, then order only with 'doc_count' is also fine.
  3. scroll or pagination (size=10)

I've tried several ideas but couldn't find the right way to solve this problem.

  • The actual scripts I've made are more complicated, but for those who searching for similar issues, I simplified the scripts.

Anyways, I've tried 3 ways to solve the problem as below.

#1. aggregation composite
It's possible group by and pagination(after key) but can't order by doc_count.

 {
"query": { "match_all": {} },
"aggs": {
	"like_counts": {
		"composite": {
			"sources": [
				{"freq_num": {"terms": {"field": "freq_num", "order": "desc"}}},
				{"comment_id": {"terms": {"field": "comment_id"}}}
			],
			"size": 10
		}
	}
}

what I expected:

 "buckets": [
    {
      "key": {
        "freq_num": 136,
        "comment_id": "JcAYqngBhI5vW3tODWnX"
      },
      "doc_count": 2
    },
    {
      "key": {
        "freq_num": 446,
        "comment_id": "IsAYqngBhI5vW3tODWm7"
      },
      "doc_count": 1
    },
    {
      "key": {
        "freq_num": 136,
        "comment_id": "I8AYqngBhI5vW3tODWnE"
      },
      "doc_count": 1
    },
	....

what I've got:

 "buckets": [
    {
      "key": {
        "freq_num": 446,
        "comment_id": "IsAYqngBhI5vW3tODWm7"
      },
      "doc_count": 1
    },
    {
      "key": {
        "freq_num": 136,
        "comment_id": "I8AYqngBhI5vW3tODWnE"
      },
      "doc_count": 1
    },
    {
      "key": {
        "freq_num": 136,
        "comment_id": "JcAYqngBhI5vW3tODWnX"
      },
      "doc_count": 2
    },
	....

#2. aggregation terms
At first, this gives the result in the sort order of the number of docs as I expected.
But If I use the partition option here, this doesn't provide the right result.

{
        "query": { "match_all": {}
        },
        "size": 0,
           "aggs": {
                    "like_counts ": {
                        "terms": {
                            "field": "comment_id",
                            "include": {
                               "partition": 0,
                               "num_partitions": 10
                            },
                            "size": 3,
                            "order": {"freq_num": "desc"}
                        },
                        "aggs": {
                            "freq_num": { "max": { "field": "freq_num" }}
                        }
                    }
                }
            
}

what I expected:

  • partition: 1

     "buckets": [
      {
        "key": "CMAXqngBhI5vW3tOq18z",
        "doc_count": 1,
        "freq_num": {
          "value": 8000
        }
      },
      
     {
        "key": "r8AXqngBhI5vW3tOxGH7",
        "doc_count": 1,
        "freq_num": {
          "value": 5000
        }
      },
      {
        "key": "tcAXqngBhI5vW3tOxWFA",
        "doc_count": 1,
        "freq_num": {
          "value": 4000
        }
      },
    
  • partition: 2

     "buckets": [
    
      {
        "key": "DsAXqngBhI5vW3tOq19p",
        "doc_count": 1,
        "freq_num": {
          "value": 2000
        }
      }
      {
        "key": "uMAXqngBhI5vW3tOxWFg",
        "doc_count": 1,
        "freq_num": {
          "value": 3000
        }
      },
      {
        "key": "DMAXqngBhI5vW3tOq19X",
        "doc_count": 1,
        "freq_num": {
          "value": 2000
        }
      }
      ]
    

what I've got:

  • partition: 1

     "buckets": [
     {
       "key": "CMAXqngBhI5vW3tOq18z",
       "doc_count": 1,
       "freq_num": {
         "value": 8000
       }
     },
     {
       "key": "tcAXqngBhI5vW3tOxWFA",
       "doc_count": 1,
       "freq_num": {
         "value": 4000
       }
     },
     {
       "key": "DsAXqngBhI5vW3tOq19p",
       "doc_count": 1,
       "freq_num": {
         "value": 2000
       }
     }
    
  • partition: 2

      "buckets": [
     {
        "key": "r8AXqngBhI5vW3tOxGH7",
        "doc_count": 1,
        "freq_num": {
          "value": 5000
        }
      },
      {
        "key": "uMAXqngBhI5vW3tOxWFg",
        "doc_count": 1,
        "freq_num": {
          "value": 3000
        }
      },
      {
        "key": "DMAXqngBhI5vW3tOq19X",
        "doc_count": 1,
        "freq_num": {
          "value": 2000
        }
      }
      ]
    

#3. collapse
I've just noticed that I also have the 'collapse' option, but I don't know how to use it here.
As I run the script below, I get a similar result but didn't get the count of each group. (collapse by comment_id)

{
   "query": { "match_all": {} },
   "size": 10,
   "collapse": {
   	"field": "comment_id"
   }
}

I'm considering rather choose #2 and fetch all data without using the partition option and split the data which might be a really bad idea.
Could anyone give me the proper way to achieve this?

Thank you.