Problem
I have a few different doc types that I would like to perform some analysis on.
The problem is that the values I want to correlate across are only (currently) accessible via a "through"
object. In our case We have a request_id
that is shared across all three indices; api, supporters, payments.
Each of these indices has a single doc type; request
, signup
, `payment.
The api
index has a campaign
attribute. Ultimately what I want is the payments, requests, and signups bucketed
by `campaign.
Current Data
Request
GET _search
{
"query": {
"term": {"request_id": 0}
}
}
Response
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 15,
"successful": 15,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 2.5040774,
"hits": [
{
"_index": "api",
"_type": "request",
"_id": "AVQRqbIhgTPMD4Z8ONUB",
"_score": 2.5040774,
"_source": {
"organization_id": "0",
"campaign": "campaign1",
"request_id": 0
}
},
{
"_index": "supporters",
"_type": "signup",
"_id": "AVQRqbPDgTPMD4Z8ONVE",
"_score": 1.9162908,
"_source": {
"organization_id": "0",
"supporter_id": 0,
"request_id": 0
}
},
{
"_index": "payments",
"_type": "payment",
"_id": "AVQRqbMYgTPMD4Z8ONUz",
"_score": 1.6931472,
"_source": {
"organization_id": "0",
"donation_id": 0,
"amount": 25,
"request_id": 0
}
}
]
}
}
What I've gotten so far
So far I've been able to get an aggregate of requests by campaign
Request
GET _search/
{
"size": 0,
"aggs": {
"by_campaign": {
"terms": {
"field": "campaign"
},
"aggs": {
"by_request_id": {
"terms": {
"field": "request_id"
}
}
}
}
}
}
Response
{
"took": 9,
"timed_out": false,
"_shards": {
"total": 15,
"successful": 15,
"failed": 0
},
"hits": {
"total": 19,
"max_score": 0,
"hits": []
},
"aggregations": {
"by_campaign": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "campaign1",
"doc_count": 3,
"by_request_id": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 0,
"doc_count": 1
},
{
"key": 4,
"doc_count": 1
},
{
"key": 8,
"doc_count": 1
}
]
}
},
{
"key": "campaign2",
"doc_count": 3,
"by_request_id": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 1,
"doc_count": 1
},
{
"key": 5,
"doc_count": 1
},
{
"key": 9,
"doc_count": 1
}
]
}
},
{
"key": "campaign3",
"doc_count": 2,
"by_request_id": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 2,
"doc_count": 1
},
{
"key": 6,
"doc_count": 1
}
]
}
} // ... RESULTS TRUNCATED
]
}
}
}
Next Steps
I've tried a few different things to use the results of the aggregations above to accomplish my goal. I know it would be fairly trivial to use this data to build a second query and get the data or by creating a separate aggregation where I bucket all of the documents by request ID building the desired result in application code.
Ultimately my question is "Is there a way to get the data aggregated the way I would like in a single query?".