Combination of two top terms aggregations

Hi,

When I perform a term aggregation on FIELD1 and a term aggregation on FIELD2, I got the following result:

FIELD1 : {
TOPTERM1FIELD1 : 10000,
TOPTERM2FIELD1 : 1000
}
and
FIELD2 : {
TOPTERM1FIELD2 : 121212,
TOPTERM2FIELD2 : 1212
}

If I combine the two aggregations with the help of the "sub-aggregation" mechanism, I get :

FIELD1 : {
TOPTERM1FIELD1 : {
FIELD2 : {
LOCALTOPTERM1FIELD2: 12
}
}
}

etc....
I would like to have all combinations of the n first terms for the two top terms:
For exemple :

TOPTERM1FIELD1 AND TOPTERM1FIELD2 : 12
TOPTERM1FIELD1 AND TOPTERM2FIELD2 : 4

I tried the composite aggregation to achieve this but it seems that it collects ordered terms and not most frequent terms first.
Is there a way to achieve this?

Thank you for your help,

Aurélien

I tried to use the" order by count desc" available in classic term aggregation :
{ "size": 0, "aggs": { "test_composite": { "composite": { "sources": [ { "IPH": { "terms": { "field": "FIELD", "order": { "_count": "desc" } } } } ] } } } }
but it didn't work... :frowning:

I have realized that my explanation wasn't very clear so I add a concrete example:

When I run the 2 aggregations separatly :

{
    "aggs" : {
    	"IPH": { "terms" : { "field": "IPH", "size" : 2} } ,
    	"PID": { "terms" : { "field": "PID" , "size" : 2} } 
     }
}

I got the following result:

{
	"aggregations": {
		"IPH": {
			"buckets": [{
					"key": "035/00",
					"doc_count": 722
				},
				{
					"key": "043/00",
					"doc_count": 559
				}
			]
		},
		"PID": {
			"buckets": [{
					"key": "960897",
					"doc_count": 170
				},
				{
					"key": "71628332",
					"doc_count": 147
				}
			]
		}
	}
}

When I try to combine them with a sub-aggregation :

{
	"aggs": {
		"IPH": {
			"terms": {
				"field": "IPH",
				"size": 2
			},
			"aggs": {
				"PID": {
					"terms": {
						"field": "PID",
						"size": 2
					}
				}
			}
		}
	}
}

I have the following result:

 {
 	"aggregations": {
 		"IPH": {
 			"buckets": [{
 					"key": "035/00",
 					"doc_count": 722,
 					"PID": {
 						"buckets": [{
 								"key": "906768",
 								"doc_count": 51
 							},
 							{
 								"key": "3975774",
 								"doc_count": 49
 							}
 						]
 					}
 				},
 				{
 					"key": "043/00",
 					"doc_count": 559,
 					"PID": {
 						"buckets": [{
 								"key": "3975774",
 								"doc_count": 49
 							},
 							{
 								"key": "640822",
 								"doc_count": 33
 							}
 						]
 					}
 				}
 			]
 		}
 	}
 }

I am trying to find a way to get

[{
	"IPH": "035/00",
	"PID": "960897",
	"count": 12
}, {
	"IPH": "035/00",
	"PID": "71628332",
	"count": 12
}, {
	"IPH": "043/00",
	"PID": "960897",
	"count": 12
}, {
	"IPH": "043/00",
	"PID": "71628332",
	"count": 12
}]

Does anybody have an idea how to achieve this?

Thank you,

So you essentially want to flatten the results (I don't know where the count of 12 comes from)?
You can use a script in a single terms aggregation to combine the PID and IPH values into a single string that acts as the key you are looking for.

Hi,

Thank you for your answer. The count 12 is probably false, but basically, for

{
	"IPH": "035/00",
	"PID": "960897",
	"count": XX
}

XX should be the count of documents matching the query : (IPH:"035/00" AND PID:"960897")

Let's assume that the count 12 is correct.

Maybe I misunderstood your solution, but I think there is a little problem with it for my use case. If I am right, with the script it is possible to have something like :

{
 "key" : "032/00_898989",
  doc_count" : 231
} 

because the intersection between IPH:"032/00" and PID:"898989" contains more than 12 documents.
However, this is not what I want because, separately, "035/00" is not in the top 2 terms for IPH and "898989" is not in the top 2 terms for PID, event if "035/00_898989" is in the top 2 terms for my query.

Thank you for your feedback,

Regards,

Aurélien

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.