Query and Index use question

If I have a index like below

index-2022
index-2023
index-2024
index-2025

dataview is "index-*" ( @timestamp is date field)

if I run a query with index-* and @timestamp > now() - interval 30 day

will it hit all the index or just recent one?

Big topic @elasticforme :slight_smile:

This all based on some concepts often called prefilter but more accurately can-match logic

There is a lot of subtlety here ...

Data Streams:

If these were data streams (which I suspect they are not), the indices/shards would get filtered in/out based on the filter and @timestamp before the actual search in the can-match stage (which is specialized for data streams) , before the actual search (query + fetch phases), so in that case any non-matching shards would be skipped....

Regular Indices:

If these are regular indices, not datastreams, that adds a bit more complexity... so take a look at this.... There is a logic/threshold when the can-match phase is applied.

Lets look at this test case....

Create 3 indices, which are based on years and the data within them is within that year.

DELETE test-2025,test-2024,test-2023
POST test-2025/_doc
{
      "@timestamp": "2025-05-03T17:22:52.592Z",
      "message" : "its 2025"
}
POST test-2024/_doc
{
      "@timestamp": "2024-04-03T17:22:52.592Z",
      "message" : "its 2024"
}
POST test-2023/_doc
{
      "@timestamp": "2023-03-03T17:22:52.592Z",
      "message" : "its 2023"
}

A) If I search with just a normal _search no shards are skipped

GET /test-2*/_search
{
 "query": {
    "bool": {
      "filter": [
        {
          "range": {
            "@timestamp": {
              "format": "strict_date_optional_time",
              "gte": "now-90d"
            }
          }
        }
      ]
    }
  }
}
# Result No Shards Skipped
{
  "took": 5,
  "timed_out": false,
  "_shards": {
    "total": 3,
    "successful": 3,
    "skipped": 0,  <<< NO SHARDS SKIPPED :(
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 0,
    "hits": [
      {
        "_index": "test-2025",
        "_id": "8r1Y8ZcB_y8EQ5ex3ukb",
        "_score": 0,
        "_source": {
          "@timestamp": "2025-05-03T17:22:52.592Z",
          "message": "its 2025"
        }
      }
    ]
  }
}

BUT

B) if I do an _async_search, which is what Discover does.... the shards are skipped

POST /test-2*/_async_search
{
 "query": {
    "bool": {
      "filter": [
        {
          "range": {
            "@timestamp": {
              "format": "strict_date_optional_time",
              "gte": "now-90d"
            }
          }
        }
      ]
    }
  }
}

# Result skipped shards 
{
  "is_partial": false,
  "is_running": false,
  "start_time_in_millis": 1752101084860,
  "expiration_time_in_millis": 1752533084860,
  "completion_time_in_millis": 1752101084862,
  "response": {
    "took": 2,
    "timed_out": false,
    "_shards": {
      "total": 3,
      "successful": 3,
      "skipped": 2,  <<< SKIPPED SHARDS YAY!!
      "failed": 0
    },
    "hits": {
      "total": {
        "value": 1,
        "relation": "eq"
      },
      "max_score": 0,
      "hits": [
        {
          "_index": "test-2025",
          "_id": "8r1Y8ZcB_y8EQ5ex3ukb",
          "_score": 0,
          "_source": {
            "@timestamp": "2025-05-03T17:22:52.592Z",
            "message": "its 2025"
          }
        }
      ]
    }
  }
}

So why is this?

Whether or not the can-match phase runs is determined by the pre_filter_shard_size setting (Run a search | Elasticsearch API documentation) - default 128

For async search, the default is 1. That is mentioned here .

  • pre_filter_shard_size defaults to 1 and cannot be changed: this is to enforce the execution of a pre-filter roundtrip to retrieve statistics from each shard so that the ones that surely don’t hold any document matching the query get skipped.

So your Query in Discover will be _async so it will take advantage of the can-match but if you just run a query in Dev Tools it will not take advantage unless it looks like it will hit 128+ shards ... Unless you set it...

GET /test-2*/_search?pre_filter_shard_size=1 <<< HERE
{
 "query": {
....

Hope this helps :slight_smile:

10 Likes

Perefect.
yes in my case my index is not datastream in this case and hence I was seeing index-2018 all the way to index-2025 on GET /_tasks.
But I have data only for that year in index-yyyy. and client is using python to search data. they was using lazy search that was without anything just "search job=xyz".

Looks like I have to test on python side on how to use async search and or pass a single index for search rather then index-* search only index-curentYYYY

Thanks for this great Explanation. :heart:

1 Like