I have to combine certain aggregation and query to extract data from elastic search . Can someone help with this following query

From a set of documents do the following:

  1. Extract documents which match the following criteria:
    a. date_field > given certain date and
    b. second_date_field < given certain date

  2. From the above extracted documents, take weekly data (week1 ,week2 etc..) .

  3. As per above week interval and calculate in how many documents does a specific field value occurs.

Following is my index:

{
"mappings": {
"properties": {
"insight_id": {
"type": "keyword"
},
"insight_score": {
"type": "keyword"
},
"insight_min_date": {
"type": "date",
"format": "dd-MM-yyyy HH:mm:ss||dd-MM-yyyy"
},
"insight_max_date": {
"type": "date",
"format": "dd-MM-yyyy HH:mm:ss||dd-MM-yyyy"
},
"topic": {
"type": "keyword"
}
}
}
}

Expected output:

if input date range: 1/4/2019-1/5/2019

It should find documents where:

  1. insight_min_date>1/4/2019 and insight_max_date<1/5/2019
  2. For week 1 (1/4/2019-7/4/2019)
    {"topic": "good fat",
    "count":20}
    {"topic":"bad fat", "count":10}

Similarly for week 2, week 3 etc.

I tried combining date histogram,date range, aggregation still don't get desired output

Do I understand correctly? You want to query the aggregated data?

I want to aggregate the filtered data. According to desired output that I mentioned.

I want something like this:

GET /yogurt_insights/_search
{
 "query":{"bool":{"must" : [
        { "range" : { "insight_max_date" : { "lte" : "31-12-2019"} } },
        { "range" : { "insight_min_date" : {"gte":"11-10-2019" } }}
      ]
 }
 },"aggregations":{
   "agg":{"terms":{"field":"topic","order":[{"_count":"desc"}]},"aggregations":{"agg":{"date_histogram":{"field":"insight_max_date","interval":31104000000,"offset":0}}}}}
  }



But here I get data aggregated only by topic because of term query, I want it to be aggregated week wise for each topic separately.

The sub-aggregation approach looks right to me, you first aggregate per topic, than per date_histogram, you could also do it the other way (more performant if your index is sorted using the timestamp).

Another alternative is a composite aggregation. The output of that might be less confusing.

Here is an example:

GET yogurt_insights/_search
{
  "query": {... your query ...}
  "size": 0,
  "aggs": {
    "t-dh": {
      "composite": {
        "sources": [
          {
            "top": {
              "terms": {
                "field": "topic"
              }
            }
          },
          {
            "dh": {
              "date_histogram": {
                "field": "insight_max_date",
                "calendar_interval": "1w"
              }
            }
          }
        ]
      }
    }
  }
}

The output I get with my query is following.

    "agg" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "Meal Replacement",
          "doc_count" : 2,
          "agg" : {
            "buckets" : [
              {
                "key_as_string" : "14-04-2019 00:00:00",
                "key" : 1555200000000,
                "doc_count" : 2
              }
            ]
          }
        },
        {
          "key" : "Sustainable",
          "doc_count" : 2,
          "agg" : {
            "buckets" : [
              {
                "key_as_string" : "14-04-2019 00:00:00",
                "key" : 1555200000000,
                "doc_count" : 2
              }
            ]
          }
        },
        {
          "key" : "Oats",
          "doc_count" : 1,
          "agg" : {
            "buckets" : [
              {
                "key_as_string" : "14-04-2019 00:00:00",
                "key" : 1555200000000,
                "doc_count" : 1
              }
            ]
          }
        }
      ]
    }
  }


The expected output should be something like this:

"buckets" : [
              {
                "key_as_string" : "14-04-2019 00:00:00",
                "key" : 1555200000000,
                agg:  {
                    { "key" : "Sustainable"
                      "doc_count" : 2 }, {"key" : "Oats"
                      "doc_count" : 1"} 
                 }
              },
           {
                "key_as_string" : "14-04-2019 00:00:00",
                "key" : 1555200000000,
                agg:  {
                    { "key" : "Sustainable"
                      "doc_count" : 2 }, {"key" : "Meal Replacement"
                      "doc_count" : 2"} 
                 }
              }
            ]

That is per week accumulated docs.

You can switch the order and 1st aggregate per date_histogram and than terms (sub-aggregation), the result looks different but is internally equal.

I tried following which would solve my purpose, but just a small part remains. Can we set the first and last limit of histogram? I want this histogram to start from min date ("insight_min_date") and end on max date field ("insight_max_date").

{
"query":{"bool":{"must" : [
       { "range" : { "insight_max_date" : { "lte" : "31-12-2019"} } },
       { "range" : { "insight_min_date" : {"gte":"11-10-2019" } }}
     ]
}
},"aggregations":{
 "agg":{"terms":{"field":"topic","order":[{"_count":"desc"}]},"aggregations":{"agg":{"date_histogram":{"field":"insight_min_date","interval":"1w","offset":0,"format": "dd-MM-yyyy"}}}}}
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.