Route search request by using the date in index name

Hi,

I am testing ILM, this is my configuration:

ILM Policy:

PUT _ilm/policy/test_policy
    {
      "policy": {
        "phases": {
        "hot": {
          "actions": {
            "rollover": {
              "max_docs": 1000
            }
          }
        }
      }}
    }

Index template:

PUT _index_template/test_template
{
  "index_patterns": [
    "test-*"
  ],
  "template": {
    "settings": {
      "index.lifecycle.name": "test_policy",
      "index.lifecycle.rollover_alias": "test_alias"
    },
    "mappings": {
      "properties": {
        "testfield": {
          "type": "keyword"
        }
      }
    }
  }
}

Bootstrap index:

    PUT /<test-{now{YYYY-MM-dd.HH.mm.SS}}-000001>
{
  "aliases": {
    "test_alias": {
      "is_write_index": true
    }
  }
}

With this configuration, on every rollover new index is created, and since I am using date math in index name, every index contains rollover time in its name.

I would like to pass date or date range along with my search request, so that search request would hit only index or indices that contain data matching the date or date range I have provided. That would be done by doing the math based on rollover time in index name and date I have provided with my search request. Is there such option in Elasticsearch?

Thanks.

That won't work with ILM, because every time that a policy rolls over it increments the counter on the end of the index name, it doesn't change the timestamp.

1 Like

It seems like it increments the counter, but also sets the rollover time in index name since I have used date math in bootstrap index. Here is a list of indices after few rollovers:

GET /*/_alias/test_alias
{
  "test-2021-04-08.09.27.23-000002" : {
    "aliases" : {
      "test_alias" : {
        "is_write_index" : false
      }
    }
  },
  "test-2021-04-08.09.47.22-000003" : {
    "aliases" : {
      "test_alias" : {
        "is_write_index" : false
      }
    }
  },
  "test-2021-04-08.10.07.22-000004" : {
    "aliases" : {
      "test_alias" : {
        "is_write_index" : true
      }
    }
  },
  "test-2021-04-08.09.15.34-000001" : {
    "aliases" : {
      "test_alias" : {
        "is_write_index" : false
      }
    }
  }
}

So, test-2021-04-08.09.47.22-000003 index contains data (logs in my case) with @timestamp in time range 09h 47m - 10h 07m. I am wondering if there is some kind of query where I could say 'I need logs with timestamp between 09h 50min and 10h 00m', and Elasticsearch would use rollover time in indices names to route that request to test-2021-04-08.09.47.22-000003 index and run search request only against that index.

Elasticsearch does this, only not using the index name which might not be right anyway: it just looks at the range of timestamps in all the relevant shards and skips any shards that don't match the range in the query. It's a very cheap check to make, and saves any of this hassle: just search test-* and let Elasticsearch pick the right shards.

1 Like

Hi, is there some additional setting/configuration I should set to enable that feature?

I have added @timestamp field in my mapping, indexed some docs, and after few rollovers I tried to use range queries on @timestamp against test-* but I do not see any shards being skipped.

Here is some additional info.
When I add sort on @timestamp in query, I can see that some shards are being skipped:

GET test-*/_search
{"profile": "true", 
  "query": {"range": {
    "@timestamp": {
      "gte": "2021-04-08T10:00:11.473Z",
      "lte": "2021-04-08T14:00:11.473Z"
    }
  }}
  , "sort": [
    {
      "@timestamp": {
        "order": "desc"
      }
    }
  ]
}

Result:

 "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 4,
    "successful" : 4,
    "skipped" : 2,
    "failed" : 0
  }

Without sorting on @timestamp field, there are no skipped shards, but query profiling is showing this for shards that should be skipped:

    "type" : "MatchNoDocsQuery",
    "description" : """MatchNoDocsQuery("User requested "match_none" query.")""",

Is it necessary to use sort on timestamp field if you want ES to skip shards that do not fit in time range?
What does MatchNoDocsQuery means?

Thanks.

It means that 2 shards won't have any of the data you are looking for so it's safe not to run the query against them.

2 Likes

What David said, but in other words, it means these shards are effectively being skipped too. A MatchNoDocsQuery matches no documents in the shard, and as you might imagine it doesn't take much time or effort to execute that.

2 Likes

I updated my post. Instead of

it's safe to run the query against them

I actually meant:

it's safe not to run the query against them

:slight_smile:

just found this post, as I'm trying to figure out something similar.

I was wondering why the following query, returned: "skipped" : 0

POST /name-*/_search?
{
  "query": {
    "bool": {
      "filter": [
        {
          "range": {
            "datetime": {
              "gt": "now-2d"
            }
          }
        }
      ]
    }
  } 
}

 "took" : 29,
  "timed_out" : false,
  "_shards" : {
    "total" : 8,
    "successful" : 8,
    "skipped" : 0,
    "failed" : 0
  },

but when running the validateapi with the same query I got for 7 shards the same explanation:

 "index" : "name-000001",  
      "valid" : true,
      "explanation" : """MatchNoDocsQuery("User requested "match_none" query.")"""

and 1 shard actually had data:

 "explanation" : "#DateRangeIncludingNowQuery(datetime:[1618925727715 TO 9223372036854775807])"

so what's the difference between skipped:7 to skipped:0 AND 7 shards returning MatchNoDocsQuery ?

thanks!

Really it's just the phase at which the skipping takes place. We always try and rewrite the query to a MatchNoDocsQuery if possible, but sometimes this happens in a preflight check (resulting in skipped shards) and sometimes it happens at query time, depending on which is predicted to be more efficient.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.