Route search request by using the date in index name

india · April 7, 2021, 10:47pm

Hi,

I am testing ILM, this is my configuration:

ILM Policy:

PUT _ilm/policy/test_policy
    {
      "policy": {
        "phases": {
        "hot": {
          "actions": {
            "rollover": {
              "max_docs": 1000
            }
          }
        }
      }}
    }

Index template:

PUT _index_template/test_template
{
  "index_patterns": [
    "test-*"
  ],
  "template": {
    "settings": {
      "index.lifecycle.name": "test_policy",
      "index.lifecycle.rollover_alias": "test_alias"
    },
    "mappings": {
      "properties": {
        "testfield": {
          "type": "keyword"
        }
      }
    }
  }
}

Bootstrap index:

    PUT /<test-{now{YYYY-MM-dd.HH.mm.SS}}-000001>
{
  "aliases": {
    "test_alias": {
      "is_write_index": true
    }
  }
}

With this configuration, on every rollover new index is created, and since I am using date math in index name, every index contains rollover time in its name.

I would like to pass date or date range along with my search request, so that search request would hit only index or indices that contain data matching the date or date range I have provided. That would be done by doing the math based on rollover time in index name and date I have provided with my search request. Is there such option in Elasticsearch?

Thanks.

warkolm · April 7, 2021, 10:57pm

That won't work with ILM, because every time that a policy rolls over it increments the counter on the end of the index name, it doesn't change the timestamp.

india · April 8, 2021, 2:08pm

It seems like it increments the counter, but also sets the rollover time in index name since I have used date math in bootstrap index. Here is a list of indices after few rollovers:

GET /*/_alias/test_alias
{
  "test-2021-04-08.09.27.23-000002" : {
    "aliases" : {
      "test_alias" : {
        "is_write_index" : false
      }
    }
  },
  "test-2021-04-08.09.47.22-000003" : {
    "aliases" : {
      "test_alias" : {
        "is_write_index" : false
      }
    }
  },
  "test-2021-04-08.10.07.22-000004" : {
    "aliases" : {
      "test_alias" : {
        "is_write_index" : true
      }
    }
  },
  "test-2021-04-08.09.15.34-000001" : {
    "aliases" : {
      "test_alias" : {
        "is_write_index" : false
      }
    }
  }
}

So, test-2021-04-08.09.47.22-000003 index contains data (logs in my case) with @timestamp in time range 09h 47m - 10h 07m. I am wondering if there is some kind of query where I could say 'I need logs with timestamp between 09h 50min and 10h 00m', and Elasticsearch would use rollover time in indices names to route that request to test-2021-04-08.09.47.22-000003 index and run search request only against that index.

DavidTurner · April 8, 2021, 2:33pm

Elasticsearch does this, only not using the index name which might not be right anyway: it just looks at the range of timestamps in all the relevant shards and skips any shards that don't match the range in the query. It's a very cheap check to make, and saves any of this hassle: just search test-* and let Elasticsearch pick the right shards.

india · April 8, 2021, 9:03pm

Hi, is there some additional setting/configuration I should set to enable that feature?

I have added @timestamp field in my mapping, indexed some docs, and after few rollovers I tried to use range queries on @timestamp against test-* but I do not see any shards being skipped.

india · April 9, 2021, 10:11am

Here is some additional info.
When I add sort on @timestamp in query, I can see that some shards are being skipped:

GET test-*/_search
{"profile": "true", 
  "query": {"range": {
    "@timestamp": {
      "gte": "2021-04-08T10:00:11.473Z",
      "lte": "2021-04-08T14:00:11.473Z"
    }
  }}
  , "sort": [
    {
      "@timestamp": {
        "order": "desc"
      }
    }
  ]
}

Result:

 "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 4,
    "successful" : 4,
    "skipped" : 2,
    "failed" : 0
  }

Without sorting on @timestamp field, there are no skipped shards, but query profiling is showing this for shards that should be skipped:

    "type" : "MatchNoDocsQuery",
    "description" : """MatchNoDocsQuery("User requested "match_none" query.")""",

Is it necessary to use sort on timestamp field if you want ES to skip shards that do not fit in time range?
What does MatchNoDocsQuery means?

Thanks.

dadoonet · April 9, 2021, 11:00am

It means that 2 shards won't have any of the data you are looking for so it's safe not to run the query against them.

DavidTurner · April 9, 2021, 12:19pm

What David said, but in other words, it means these shards are effectively being skipped too. A MatchNoDocsQuery matches no documents in the shard, and as you might imagine it doesn't take much time or effort to execute that.

dadoonet · April 10, 2021, 12:54pm

I updated my post. Instead of

it's safe to run the query against them

I actually meant:

it's safe not to run the query against them

liorg2 · April 22, 2021, 1:38pm

just found this post, as I'm trying to figure out something similar.

I was wondering why the following query, returned: "skipped" : 0

POST /name-*/_search?
{
  "query": {
    "bool": {
      "filter": [
        {
          "range": {
            "datetime": {
              "gt": "now-2d"
            }
          }
        }
      ]
    }
  } 
}

 "took" : 29,
  "timed_out" : false,
  "_shards" : {
    "total" : 8,
    "successful" : 8,
    "skipped" : 0,
    "failed" : 0
  },

but when running the validateapi with the same query I got for 7 shards the same explanation:

 "index" : "name-000001",  
      "valid" : true,
      "explanation" : """MatchNoDocsQuery("User requested "match_none" query.")"""

and 1 shard actually had data:

 "explanation" : "#DateRangeIncludingNowQuery(datetime:[1618925727715 TO 9223372036854775807])"

so what's the difference between skipped:7 to skipped:0 AND 7 shards returning MatchNoDocsQuery ?

thanks!

DavidTurner · April 22, 2021, 1:43pm

Really it's just the phase at which the skipping takes place. We always try and rewrite the query to a MatchNoDocsQuery if possible, but sometimes this happens in a preflight check (resulting in skipped shards) and sometimes it happens at query time, depending on which is predicted to be more efficient.

system · May 20, 2021, 1:44pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to name dated Index for ILM with Roll over Elasticsearch ilm-index-lifecycle-management	2	588	September 10, 2020
Change index alias Elasticsearch ilm-index-lifecycle-management	1	329	August 8, 2021
Rolling-over with daily index using ILM policy Elasticsearch ilm-index-lifecycle-management	2	446	July 25, 2020
ILM with time based indices - eg test-logs-%{+YYYYMMdd}-000001 Elasticsearch	2	651	April 7, 2020
How to rollover index that is ending with date Elasticsearch ilm-index-lifecycle-management	7	343	November 14, 2023

Route search request by using the date in index name

Related topics