I am using elastic 1.5.
Here are my mappings:
config = {
"mappings": {
my_doc_type: {
"dynamic": False,
"properties": {
"timestamp_start": {
"type": "date"
},
"timestamp_end": {
"type": "date"
},
"entity_id": {
"type": "string",
"index": "not_analyzed"
},
"version": {
"type": "string",
"index": "not_analyzed"
},
}
},
...
}
}
I am trying to perform a query like this:
{
"aggregations": {
"by_version": {
"aggregations": {
"by_entity_id": {
"terms": {
"field": "entity_id"
}
}
},
"terms": {
"field": "version"
}
}
},
"from": 0,
"size": 0
}
Here is what I get:
{
"aggregations": {
"by_version": {
"buckets": [
{
"by_entity_id": {
"buckets": [
{
"doc_count": 480,
"key": "4bcf"
},
{
"doc_count": 480,
"key": "60965392"
},
{
"doc_count": 480,
"key": "73ba"
},
{
"doc_count": 480,
"key": "bb1f"
},
{
"doc_count": 480,
"key": "ff0cf25f5480"
}
],
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0
},
"doc_count": 480,
"key": "2.2.0"
}
],
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0
}
},
"items": [],
"total": 480
}
What I am going for is:
- group my documents by entity_id (I don't care how many there are, a bucket size of 1 is all I need)
- group these entity_id buckets into buckets based on the
version
field (the number of documents in this bucket would be equal to 1 document per entity_id bucket)
I don't really need the buckets for my by_entity_id
terms aggregation - there will be tens of thousands. I'm using a term aggregation here because I don't know a better way - I just want to know the number of unique field values for 'entity_id' for each version.
I feel like I'm abusing terms aggregations or that there's a much better way to do this. Could I use a cardinality aggregation somehow? The Sum and Value Count aggregations don't really seem helpful here.
Thanks for any help you can provide!