Filter last X documents based on date, then do a metric aggregation

i have the following mapping in my ES:

{
	"properties": {
		"id": {
			"type": "keyword",
			"fields": {
				"keyword": {
					"type": "keyword",
					"ignore_above": 256
				}
			}
		},
		"creationTime": {
			"type": "date"
		},
		"duration": {
			"type": "long"
		}
	}
}

and i want to calculate percentile, based on latest X(1000 for example) documents, per ID
what i have for now is that:

{
  "size": 0,
  "aggs": {
    "ids": {
      "terms": {
        "field": "id",
        "size": 20000
      },
      "aggs": {
        "percentileByDuration": {
          "percentiles": {
            "field": "duration",
            "percents": [
              50
            ]
          }
        }
      }
    }
  }
}

but any form of making get only the latest 1000 documents, i get an error such as sub aggregation not supported and etc.
how can i manage to achieve my goal?
p.s. the 20k on the terms size is because i only want any 20k id's

How did you get only the latest 1000 documents and What was the exact error? Could you share the queries and error messages?

It could be something like this:

GET /kibana_sample_data_flights/_search
{
  "size":0,
  "query":{
    "function_score": {
      "field_value_factor": {
        "field": "timestamp"
      }
    }
  },
  "aggs":{
    "top1000":{
      "sampler": {
        "shard_size": 1000
      },
      "aggs":{
        "ids":{
          "terms": {
            "field": "DestAirportID",
            "size": 10
          },
          "aggs":{
            "percentile":{
              "percentiles": {
                "field": "AvgTicketPrice",
                "percents": [
                  50
                ]
              }
            }
          }
        }
      }
    }
  }
}

I couldn't find the way to use top 1000 documents across shards/indices.

i tried several things, example of a query that didn't failed, but also didn't do what i expected:

{
	"size": 0,
	"query": {
		"match_all": {}
	},
	"collapse": {
		"field": "ids",
		"inner_hits": {
			"name": "most_recent",
			"size": 1000,
			"sort": [
				{
					"creationTime": "desc"
				}
			]
		},
		"aggs": {
			"ids": {
				"terms": {
					"field": "id",
					"size": 20000
				},
				"aggs": {
					"percentileByDuration": {
						"percentiles": {
							"field": "duration",
							"percents": [
								50
							]
						}
					}
				}
			}
		}
	}
}

as it ignored my collapse within the aggregation

in your example btw, it did returned the latest 1000, but only 1000, without consideration of the "id" field
to clarify, i want to return 1000 for EACH id i have(i want to percentile for each id on the latest documents)
regarding across shards/indices, that's not a concern as i have only 1 shard for specific index.

i tried to make a top aggregation, but i couldn't find a way to sort within the terms on a date field:

{
	"size": 0,
	"aggregations": {
		"sortbycreationtime": {
			"terms": {
				"field": "creationTime",
				"size": 10,
				"order": "desc"
			},
			"aggs": {
				"ids": {
					"terms": {
						"field": "id",
						"size": 100
					},
					"aggs": {
						"percentileByDuration": {
							"percentiles": {
								"field": "duration",
								"percents": [
									50
								]
							}
						}
					}
				}
			}
		}
	}
}

as the error state:

[terms] order doesn't support values of type: VALUE_STRING

Yes, collapse only manipulate the output and not affect aggregation.

If you want aggregation on top 1000 documents for each ids, just exchange the place of sampler aggregation and terms aggregation.

GET /kibana_sample_data_flights/_search
{
  "size":0,
  "query":{
    "function_score": {
      "field_value_factor": {
        "field": "timestamp"
      }
    }
  },
  "aggs":{
    "ids":{
      "terms": {
        "field": "DestAirportID",
        "size": 10
      },
      "aggs":{
        "top1000":{
          "sampler": {
            "shard_size": 1000
          },
          "aggs":{
            "percentile":{
              "percentiles": {
                "field": "AvgTicketPrice",
                "percents": [
                  50
                ]
              }
            }
          }
        }
      }
    }
  }
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.