ElasticSearch: How to make an aggregation pipeline?

ayodev · April 8, 2020, 11:52am

Imagine the following use case:

We work at Stark Airlines and our marketing team wants to segment our passengers in order to give them discounts or gift cards. They decide that they want two sets of passengers:

Passengers that fly at least 3 times per week
Passenger who have flown at least once but who have not flown for two weeks

With this they can make different marketing campaigns for our passengers!

So, in elastic search we have a trip index that represents a ticket bought by a passenger:

{
   "_index" : "trip",
   "_type" : "_doc",
   "_id" : "1",
   "_score" : 1.0,
   "_source" : {
       "total_amount" : 300,
       "trip_date" : "2020/03/24 13:30:00",
       "status" : "completed",
       "passenger" : {
          "id" : 11,
          "name" : "Thiago nunes"
       }
     }
}

The trip index contains a status field that may have other values like: pending or open or canceled

This means that we can only take into account trips that has the completed status (Meaning the passenger did travel).

So, with all this in mind...How would I get those two sets of passengers with elastic search?

I have been trying for a while but with no success.

What I have done until now:

I have built a query that gets all valid trip (trips with status completed)

GET /trip/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "status": {
              "value": "completed"
            }
          }
        }
      ]
    }
  },
  "aggs": {
    "status_viagem": {
      "terms": {
        "field": "status.keyword"
      }
    }
  }
}

This query returns the following:

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 200,
      "relation" : "eq"
    },
    "max_score" : 0.18232156,
    "hits" : [...]
  },
  "aggregations" : {
    "status_viagem" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "completed",
          "doc_count" : 200
        }
      ]
    }
  }
}

But I am stuck and can't figure out the next step. I know that the next thing to do should create buckets with passengers and then filter them in two buckets representing our desired data sets. But I don't know how.

Can someone help?

PS.:

I don't exactly need this to be one single query, just a hint about how to build a query like this would be very helpful
THE OUPUT SHOULD BE AN ARRAY of passenger id's
Note: I have shortened the trip index for the sake of simplicity

Mark_Harwood · April 8, 2020, 11:59am

Sounds like a classic case for transforms to create an entity-centric index

system · May 6, 2020, 11:59am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.