The field mapping is like this:
[
{
"type": 1,
"document_id": [4, 5]
},
{
"type": 1,
"document_id": [4]
},
{
"type": 2,
"document_id": [5]
},
{
"type": 2,
"document_id": [4,5]
}
]
Now I am trying to get the unique document id count of type 1 and type 2, the tricky part is, I don't want to count the document ids again in type 2 if they had been counted in type 1.
For example, by using the cardinality aggregation
{
"query": {
"bool": {
"must": [
{
"term": {
"type": 1
}
}
]
}
},
"aggs": {
"document_count": {
"cardinality": {
"field": "document_id"
}
}
}
}
I can get there are 2 unique document ids in type 1. If I do the same query for type 2, I will get count 2 as well.
But the expected result I am trying to get is counting 2 for type 1, counting 0 for type 2 because I'd like to exclude document id 4 and 5 from type 2 since they have been counted in type 1.
Does anyone know if this is doable please?
Thanks!
I've tried to solve the issue with scripted_metric aggregation.
Demo data
PUT testd/_doc/1
{
"type": 1,
"document_id": [
4,
5
]
}
PUT testd/_doc/2
{
"type": 1,
"document_id": [
4
]
}
PUT testd/_doc/3
{
"type": 2,
"document_id": [
5
]
}
PUT testd/_doc/4
{
"type": 2,
"document_id": [
4,
5
]
}
Aggregation
This can only work on Elasticsearch 7.7.
GET testd/_search
{
"aggs": {
"NAME": {
"scripted_metric": {
"init_script": "state.types = new HashMap();",
"map_script": "def t = doc['type'].value.toString(); if (!state.types.containsKey(t)) { state.types[t] = new HashSet(); }\nstate.types[t].addAll(doc['document_id']);",
"combine_script": "return state;",
"reduce_script": "def type1 = new HashSet(); def type2 = new HashSet(); for (s in states) { type1.addAll(s.types['1']); type2.addAll(s.types['2']); } type2.removeAll(type1); return [ '1': type1.size(), '2': type2.size() ]"
}
}
}
}
Alternative for pre 7.7 (slightly less efficient).
GET testd/_search
{
"aggs": {
"NAME": {
"scripted_metric": {
"init_script": "state.types = new HashMap();",
"map_script": "def t = doc['type'].value.toString(); if (!state.types.containsKey(t)) { state.types[t] = new HashMap(); }\n for(d in doc['document_id']) { state.types[t][d] = true; }",
"combine_script": "return state;",
"reduce_script": "def type1 = new HashSet(); def type2 = new HashSet(); for (s in states) { type1.addAll(s.types['1'].keySet()); type2.addAll(s.types['2'].keySet()); } type2.removeAll(type1); return [ '1': type1.size(), '2': type2.size() ]"
}
}
}
}
Result:
...
"aggregations" : {
"NAME" : {
"value" : {
"1" : 2,
"2" : 0
}
}
}
}
Thanks a lot @Luca_Belluccini! That is exactly what I need.