Is there any possible solution for group by, order by, and pagination?

Hi, I got stuck on the problem with elasticsearch, and appreciated it if someone could help me.

I have 2 indices, comment_index, and like_index.

  • comment_index: user comment data
  • like_index: keeps track of which user likes which comment
    {'user_id': '1', 'commend_id': 1},{'user_id': '1', 'commend_id': 2},{'user_id': '2', 'commend_id': 1}, ...

mappings (like_index)

"mappings": {
"properties": {
	"comment_id": {
	  "type": "keyword"
	},
	"freq_num": {
	  "type": "integer" --> copied from comment_index, since the result needs to be ordered by this field. 
	},
	"user_id": {
	  "type": "keyword"
	}
	....
}

}

And below is what I need to do.
query like_index, and the result needs to be

  1. group by comment_id (to get how many likes each comment has)
  2. order by comment_id count (count of each comment_id, that is, the number of like counts of each comments) and order by an additional field(freq_num)
    if it cannot be possible, then order only with 'doc_count' is also fine.
  3. scroll or pagination (size=10)

I've tried several ideas but couldn't find the right way to solve this problem.

  • The actual scripts I've made are more complicated, but for those who searching for similar issues, I simplified the scripts.

Anyways, I've tried 3 ways to solve the problem as below.

#1. aggregation composite
It's possible group by and pagination(after key) but can't order by doc_count.

 {
"query": { "match_all": {} },
"aggs": {
	"like_counts": {
		"composite": {
			"sources": [
				{"freq_num": {"terms": {"field": "freq_num", "order": "desc"}}},
				{"comment_id": {"terms": {"field": "comment_id"}}}
			],
			"size": 10
		}
	}
}

what I expected:

 "buckets": [
    {
      "key": {
        "freq_num": 136,
        "comment_id": "JcAYqngBhI5vW3tODWnX"
      },
      "doc_count": 2
    },
    {
      "key": {
        "freq_num": 446,
        "comment_id": "IsAYqngBhI5vW3tODWm7"
      },
      "doc_count": 1
    },
    {
      "key": {
        "freq_num": 136,
        "comment_id": "I8AYqngBhI5vW3tODWnE"
      },
      "doc_count": 1
    },
	....

what I've got:

 "buckets": [
    {
      "key": {
        "freq_num": 446,
        "comment_id": "IsAYqngBhI5vW3tODWm7"
      },
      "doc_count": 1
    },
    {
      "key": {
        "freq_num": 136,
        "comment_id": "I8AYqngBhI5vW3tODWnE"
      },
      "doc_count": 1
    },
    {
      "key": {
        "freq_num": 136,
        "comment_id": "JcAYqngBhI5vW3tODWnX"
      },
      "doc_count": 2
    },
	....

#2. aggregation terms
At first, this gives the result in the sort order of the number of docs as I expected.
But If I use the partition option here, this doesn't provide the right result.

{
        "query": { "match_all": {}
        },
        "size": 0,
           "aggs": {
                    "like_counts ": {
                        "terms": {
                            "field": "comment_id",
                            "include": {
                               "partition": 0,
                               "num_partitions": 10
                            },
                            "size": 3,
                            "order": {"freq_num": "desc"}
                        },
                        "aggs": {
                            "freq_num": { "max": { "field": "freq_num" }}
                        }
                    }
                }
            
}

what I expected:

  • partition: 1

     "buckets": [
      {
        "key": "CMAXqngBhI5vW3tOq18z",
        "doc_count": 1,
        "freq_num": {
          "value": 8000
        }
      },
      
     {
        "key": "r8AXqngBhI5vW3tOxGH7",
        "doc_count": 1,
        "freq_num": {
          "value": 5000
        }
      },
      {
        "key": "tcAXqngBhI5vW3tOxWFA",
        "doc_count": 1,
        "freq_num": {
          "value": 4000
        }
      },
    
  • partition: 2

     "buckets": [
    
      {
        "key": "DsAXqngBhI5vW3tOq19p",
        "doc_count": 1,
        "freq_num": {
          "value": 2000
        }
      }
      {
        "key": "uMAXqngBhI5vW3tOxWFg",
        "doc_count": 1,
        "freq_num": {
          "value": 3000
        }
      },
      {
        "key": "DMAXqngBhI5vW3tOq19X",
        "doc_count": 1,
        "freq_num": {
          "value": 2000
        }
      }
      ]
    

what I've got:

  • partition: 1

     "buckets": [
     {
       "key": "CMAXqngBhI5vW3tOq18z",
       "doc_count": 1,
       "freq_num": {
         "value": 8000
       }
     },
     {
       "key": "tcAXqngBhI5vW3tOxWFA",
       "doc_count": 1,
       "freq_num": {
         "value": 4000
       }
     },
     {
       "key": "DsAXqngBhI5vW3tOq19p",
       "doc_count": 1,
       "freq_num": {
         "value": 2000
       }
     }
    
  • partition: 2

      "buckets": [
     {
        "key": "r8AXqngBhI5vW3tOxGH7",
        "doc_count": 1,
        "freq_num": {
          "value": 5000
        }
      },
      {
        "key": "uMAXqngBhI5vW3tOxWFg",
        "doc_count": 1,
        "freq_num": {
          "value": 3000
        }
      },
      {
        "key": "DMAXqngBhI5vW3tOq19X",
        "doc_count": 1,
        "freq_num": {
          "value": 2000
        }
      }
      ]
    

#3. collapse
I've just noticed that I also have the 'collapse' option, but I don't know how to use it here.
As I run the script below, I get a similar result but didn't get the count of each group. (collapse by comment_id)

{
   "query": { "match_all": {} },
   "size": 10,
   "collapse": {
   	"field": "comment_id"
   }
}

I'm considering rather choose #2 and fetch all data without using the partition option and split the data which might be a really bad idea.
Could anyone give me the proper way to achieve this?

Thank you.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.