Dev question: is this possible with aggregations?

Rigdon · April 13, 2016, 10:41pm

Problem

I have a few different doc types that I would like to perform some analysis on.
The problem is that the values I want to correlate across are only (currently) accessible via a "through"
object. In our case We have a request_id that is shared across all three indices; api, supporters, payments.
Each of these indices has a single doc type; request, signup, `payment.

The api index has a campaign attribute. Ultimately what I want is the payments, requests, and signups bucketed
by `campaign.

Current Data

Request

GET _search
{
   "query": {
      "term": {"request_id": 0}
   }
}

Response

{
   "took": 3,
   "timed_out": false,
   "_shards": {
      "total": 15,
      "successful": 15,
      "failed": 0
   },
   "hits": {
      "total": 3,
      "max_score": 2.5040774,
      "hits": [
         {
            "_index": "api",
            "_type": "request",
            "_id": "AVQRqbIhgTPMD4Z8ONUB",
            "_score": 2.5040774,
            "_source": {
               "organization_id": "0",
               "campaign": "campaign1",
               "request_id": 0
            }
         },
         {
            "_index": "supporters",
            "_type": "signup",
            "_id": "AVQRqbPDgTPMD4Z8ONVE",
            "_score": 1.9162908,
            "_source": {
               "organization_id": "0",
               "supporter_id": 0,
               "request_id": 0
            }
         },
         {
            "_index": "payments",
            "_type": "payment",
            "_id": "AVQRqbMYgTPMD4Z8ONUz",
            "_score": 1.6931472,
            "_source": {
               "organization_id": "0",
               "donation_id": 0,
               "amount": 25,
               "request_id": 0
            }
         }
      ]
   }
}

What I've gotten so far

So far I've been able to get an aggregate of requests by campaign

Request

GET _search/
{
   "size": 0,
   "aggs": {
        "by_campaign": {
            "terms": {
                "field": "campaign"
            },
            "aggs": {
                "by_request_id": {
                    "terms": {
                        "field": "request_id"
                    }
                }
            }
       }
   }
}

Response

{
   "took": 9,
   "timed_out": false,
   "_shards": {
      "total": 15,
      "successful": 15,
      "failed": 0
   },
   "hits": {
      "total": 19,
      "max_score": 0,
      "hits": []
   },
   "aggregations": {
      "by_campaign": {
         "doc_count_error_upper_bound": 0,
         "sum_other_doc_count": 0,
         "buckets": [
            {
               "key": "campaign1",
               "doc_count": 3,
               "by_request_id": {
                  "doc_count_error_upper_bound": 0,
                  "sum_other_doc_count": 0,
                  "buckets": [
                     {
                        "key": 0,
                        "doc_count": 1
                     },
                     {
                        "key": 4,
                        "doc_count": 1
                     },
                     {
                        "key": 8,
                        "doc_count": 1
                     }
                  ]
               }
            },
            {
               "key": "campaign2",
               "doc_count": 3,
               "by_request_id": {
                  "doc_count_error_upper_bound": 0,
                  "sum_other_doc_count": 0,
                  "buckets": [
                     {
                        "key": 1,
                        "doc_count": 1
                     },
                     {
                        "key": 5,
                        "doc_count": 1
                     },
                     {
                        "key": 9,
                        "doc_count": 1
                     }
                  ]
               }
            },
            {
               "key": "campaign3",
               "doc_count": 2,
               "by_request_id": {
                  "doc_count_error_upper_bound": 0,
                  "sum_other_doc_count": 0,
                  "buckets": [
                     {
                        "key": 2,
                        "doc_count": 1
                     },
                     {
                        "key": 6,
                        "doc_count": 1
                     }
                  ]
               }
            } // ... RESULTS TRUNCATED 
         ]
      }
   }
}

Next Steps

I've tried a few different things to use the results of the aggregations above to accomplish my goal. I know it would be fairly trivial to use this data to build a second query and get the data or by creating a separate aggregation where I bucket all of the documents by request ID building the desired result in application code.

Ultimately my question is "Is there a way to get the data aggregated the way I would like in a single query?".

nik9000 · April 14, 2016, 1:11am

Look up parent/child. I think maybe that can help you. The trouble is that
parent/child effects routing and can create hot spots if you have a parent
with many children. Another option is just denormalizing the data from the
parent into the child. That is often the best way to go as far as speed is
concerned.

Sorry I can't be of more help!

Topic		Replies	Views
Aggregation on parent/child documents Elasticsearch	3	742	July 6, 2017
Is it possible to reference query/filer data in your aggregations Elasticsearch	2	320	July 6, 2017
Help on optimization of aggregation request with fieldData set to true Elasticsearch	10	588	September 17, 2018
Need help, multiple aggregations with filters extremely slow, where to look for optimizations? Elasticsearch	7	585	July 6, 2017
Performance consider for parent child aggregration Elasticsearch	1	324	April 17, 2020

Dev question: is this possible with aggregations?

Problem

Current Data

Request

Response

What I've gotten so far

Request

Response

Next Steps

Related topics