Hi, I got stuck on the problem with elasticsearch, and appreciated it if someone could help me.
I have 2 indices, comment_index, and like_index.
- comment_index: user comment data
-
like_index: keeps track of which user likes which comment
{'user_id': '1', 'commend_id': 1},{'user_id': '1', 'commend_id': 2},{'user_id': '2', 'commend_id': 1}, ...
mappings (like_index)
"mappings": {
"properties": {
"comment_id": {
"type": "keyword"
},
"freq_num": {
"type": "integer" --> copied from comment_index, since the result needs to be ordered by this field.
},
"user_id": {
"type": "keyword"
}
....
}
}
And below is what I need to do.
query like_index, and the result needs to be
- group by comment_id (to get how many likes each comment has)
- order by comment_id count (count of each comment_id, that is, the number of like counts of each comments) and order by an additional field(freq_num)
if it cannot be possible, then order only with 'doc_count' is also fine. - scroll or pagination (size=10)
I've tried several ideas but couldn't find the right way to solve this problem.
- The actual scripts I've made are more complicated, but for those who searching for similar issues, I simplified the scripts.
Anyways, I've tried 3 ways to solve the problem as below.
#1. aggregation composite
It's possible group by and pagination(after key) but can't order by doc_count.
{
"query": { "match_all": {} },
"aggs": {
"like_counts": {
"composite": {
"sources": [
{"freq_num": {"terms": {"field": "freq_num", "order": "desc"}}},
{"comment_id": {"terms": {"field": "comment_id"}}}
],
"size": 10
}
}
}
what I expected:
"buckets": [
{
"key": {
"freq_num": 136,
"comment_id": "JcAYqngBhI5vW3tODWnX"
},
"doc_count": 2
},
{
"key": {
"freq_num": 446,
"comment_id": "IsAYqngBhI5vW3tODWm7"
},
"doc_count": 1
},
{
"key": {
"freq_num": 136,
"comment_id": "I8AYqngBhI5vW3tODWnE"
},
"doc_count": 1
},
....
what I've got:
"buckets": [
{
"key": {
"freq_num": 446,
"comment_id": "IsAYqngBhI5vW3tODWm7"
},
"doc_count": 1
},
{
"key": {
"freq_num": 136,
"comment_id": "I8AYqngBhI5vW3tODWnE"
},
"doc_count": 1
},
{
"key": {
"freq_num": 136,
"comment_id": "JcAYqngBhI5vW3tODWnX"
},
"doc_count": 2
},
....
#2. aggregation terms
At first, this gives the result in the sort order of the number of docs as I expected.
But If I use the partition option here, this doesn't provide the right result.
{
"query": { "match_all": {}
},
"size": 0,
"aggs": {
"like_counts ": {
"terms": {
"field": "comment_id",
"include": {
"partition": 0,
"num_partitions": 10
},
"size": 3,
"order": {"freq_num": "desc"}
},
"aggs": {
"freq_num": { "max": { "field": "freq_num" }}
}
}
}
}
what I expected:
-
partition: 1
"buckets": [ { "key": "CMAXqngBhI5vW3tOq18z", "doc_count": 1, "freq_num": { "value": 8000 } }, { "key": "r8AXqngBhI5vW3tOxGH7", "doc_count": 1, "freq_num": { "value": 5000 } }, { "key": "tcAXqngBhI5vW3tOxWFA", "doc_count": 1, "freq_num": { "value": 4000 } },
-
partition: 2
"buckets": [ { "key": "DsAXqngBhI5vW3tOq19p", "doc_count": 1, "freq_num": { "value": 2000 } } { "key": "uMAXqngBhI5vW3tOxWFg", "doc_count": 1, "freq_num": { "value": 3000 } }, { "key": "DMAXqngBhI5vW3tOq19X", "doc_count": 1, "freq_num": { "value": 2000 } } ]
what I've got:
-
partition: 1
"buckets": [ { "key": "CMAXqngBhI5vW3tOq18z", "doc_count": 1, "freq_num": { "value": 8000 } }, { "key": "tcAXqngBhI5vW3tOxWFA", "doc_count": 1, "freq_num": { "value": 4000 } }, { "key": "DsAXqngBhI5vW3tOq19p", "doc_count": 1, "freq_num": { "value": 2000 } }
-
partition: 2
"buckets": [ { "key": "r8AXqngBhI5vW3tOxGH7", "doc_count": 1, "freq_num": { "value": 5000 } }, { "key": "uMAXqngBhI5vW3tOxWFg", "doc_count": 1, "freq_num": { "value": 3000 } }, { "key": "DMAXqngBhI5vW3tOq19X", "doc_count": 1, "freq_num": { "value": 2000 } } ]
#3. collapse
I've just noticed that I also have the 'collapse' option, but I don't know how to use it here.
As I run the script below, I get a similar result but didn't get the count of each group. (collapse by comment_id)
{
"query": { "match_all": {} },
"size": 10,
"collapse": {
"field": "comment_id"
}
}
I'm considering rather choose #2 and fetch all data without using the partition option and split the data which might be a really bad idea.
Could anyone give me the proper way to achieve this?
Thank you.