Sampler aggregation fails to optimize queries

I need to calculate some metrics for a dashboard view of a ~30gb index.
As I understand it, the sampler aggregation can be used to perform faster calculations on a small sample of the data, but the performance I get is abysmal even for a very small sample size, which does not make sense and defeats the purpose of using the sampler aggregation.

Example (runs for 6 seconds):

GET index_prefix*/_search?size=0
{
  "aggs": {
    "sample": {
      "sampler": {
        "shard_size": 10
      },
      "aggs": {
        "last_month_events": {
          "filter": {
            "range": {
              "@timestamp": {
                "gte": "now-30d"
              }
            }
          }
        }
      }
    }
  }
}

Am I missing something here? Is it possible to achieve good performance for this query?

Thanks

The sampler aggregation gets the best-scoring docs. In your example request you have no query so there is no notion of "best" - it just iterates over all docs in the index hoping to find the highest scoring docs (they will all score "1" in your example).
You then filter this sample by your date range.

It would make more sense to use the search index and put your range criteria in the query part of the request. This would mean we'd only iterate over docs that match the criteria.

Sorry, this is the correct query (a uniform sample of documents):

GET index_prefix*/_search?size=0
{
  "query": {
    "function_score": {
      "random_score": {}
    }
  },
  "aggs": {
    "sample": {
      "sampler": {
        "shard_size": 10
      },
      "aggs": {
        "last_month_events": {
          "filter": {
            "range": {
              "@timestamp": {
                "gte": "now-30d"
              }
            }
          }
        }
      }
    }
  }
}

The performance is still bad.

My point re the date criteria being in the query part of the clause still stands.

Using a simple count query by date range is still too slow (a few seconds).

What version of elasticsearch are you running?

ELK 7.2.1
(As far as I understand, the function score is calculated for the entire index during the aggregation, which takes a long time. Sampling a few documents uniformly should be faster.)

Thanks.

Can you share this query JSON?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.