Just wanted to know if there is an elegant solution for my problem:
We sort of have an index that stores orders and on those orders is an array of zipcodes which the customer selected.
{
"identifier": "1574417821835",
"selectedZipCodes": [ "96106", "96126", "96145", "96148", "96151", "96161", "96166", "96176", "96179", "96182", "96184", "96190", "96250", "96274", "97500" ],
"sortTimestamp": 1575673199999
}
now for visualisation on a map, we try to cluster those zipcodes depending on the zoom-level of the map.
I tried a couple of things with different aggregations and filters for the buckets, cardinality doesn't work, because it always counts all elements of the matched document, not the ones i filtered for (terms: include: regex)
I sure could handle it on the client side, but that means I would have to return all the zipcodes in buckets, e. g. up to 100k in edge-cases (partitioned terms in that case), which results in a huge json-payload.
this is what I got so far:
{
"size": 0,
"query":{"bool":{"filter":{"exists":{"field":"selectedZipCodes"}}}},
"aggs": {
"zips": {
"filter": {
"prefix": {
"selectedZipCodes": "9"
}
},
"aggs": {
"9x_count": {
"filter": {
"prefix": {
"selectedZipCodes": "9"
}
},
"aggs": {
"counts": {
"terms": {
"field": "selectedZipCodes",
"include": "9.*"
}
}
}
},
"96x": {
"filter": {
"prefix": {
"selectedZipCodes": "96"
}
},
"aggs": {
"counts": {
"terms": {
"field": "selectedZipCodes",
"include": "96.*"
}
}
}
},
"97x": {
"filter": {
"prefix": {
"selectedZipCodes": "97"
}
},
"aggs": {
"counts": {
"terms": {
"field": "selectedZipCodes",
"include": "97.*"
}
}
}
}
}
}
}
}
which just results in overcrowded buckets:
"aggregations": {
"zips": {
"doc_count": 211,
"97x": {
"doc_count": 198,
"counts": {
"doc_count_error_upper_bound": 1,
"sum_other_doc_count": 211,
"buckets": [
{
"key": "97500",
"doc_count": 198
},
{
"key": "97486",
"doc_count": 25
},
{
"key": "97496",
"doc_count": 25
},
{
"key": "97070",
"doc_count": 1
},
{
"key": "97072",
"doc_count": 1
},
{
"key": "97074",
"doc_count": 1
},
{
"key": "97076",
"doc_count": 1
},
{
"key": "97078",
"doc_count": 1
},
{
"key": "97080",
"doc_count": 1
},
{
"key": "97082",
"doc_count": 1
}
]
}
},
"96x": {
"doc_count": 206,
"counts": {
"doc_count_error_upper_bound": 1,
"sum_other_doc_count": 488,
"buckets": [
{
"key": "96106",
"doc_count": 200
},
{
"key": "96190",
"doc_count": 200
},
{
"key": "96176",
"doc_count": 199
},
{
"key": "96161",
"doc_count": 198
},
{
"key": "96179",
"doc_count": 198
},
{
"key": "96182",
"doc_count": 198
},
{
"key": "96184",
"doc_count": 198
},
{
"key": "96274",
"doc_count": 198
},
{
"key": "96151",
"doc_count": 197
},
{
"key": "96166",
"doc_count": 197
}
]
}
},
"9x_count": {
"doc_count": 211,
"counts": {
"doc_count_error_upper_bound": 4,
"sum_other_doc_count": 1866,
"buckets": [
{
"key": "96106",
"doc_count": 200
},
{
"key": "96190",
"doc_count": 200
},
{
"key": "96176",
"doc_count": 199
},
{
"key": "96161",
"doc_count": 198
},
{
"key": "96179",
"doc_count": 198
},
{
"key": "96182",
"doc_count": 198
},
{
"key": "96184",
"doc_count": 198
},
{
"key": "96274",
"doc_count": 198
},
{
"key": "97500",
"doc_count": 198
},
{
"key": "96151",
"doc_count": 197
}
]
}
}
}
}
maybe someone got an idea or something to kickstart from - or did I miss out the bucket_count_ aggregation that counts the terms without actually needing them? ^^
oh, expected would be something like this (structure omitted):
{
"9x": {
"count": 42,
"96": {
"count": 13,
"960": {...}
},
"97": {
"count": 9,
"976": {...}
}
...
},
"6x": {...}
}
or different structure, just an idea