Filtering indices in URL?

Hello,

my indices are named with the typical scheme -, like mylogs-2016.07.26

Doing aggregations over time, I use the wildcard in the URL mylogs-* to select the indices, and then use a filter query to query since now

I've tried to use the techniques described in https://speakerdeck.com/polyfractal/elasticsearch-query-optimization to cache the time range, but that only really help after the first query, and even, but when I only scan through one index (one day), there is a huge differnece between using the wildcard and using the actual full index name.

So that works for one day, but I also want to query over multiple days, weeks, months...

I tried to use lucene patterns in the URL, but that doesn't seem to work.

I guess there is a way to use _index in the query itself, but the client I use uses the URL scheme to look up indices.

So the question is: is there a way to be fancy with indices names in the URL?
What's the proper syntax or where do i find help on this?

Thanks

Hi @streamn,

you can find the supported syntax in the section 'multiple indices' in the reference documentation.

In Elasticsearch 5.0 we will be able to cache the results of the indices that cover the middle of a range, even if you use "now".

Example:

you have daily indices and want to query from now - 7d until now (and "now" is "2016-07-27 09:00")

the query will be rewritten so that the following matches in the indices mylogs-2016-07-21 until mylogs-2016-07-26 can be cached completely. Elasticsearch will run the actual query just against mylogs-2016-07-20 and mylogs-2016-07-27 which should also solve a lot of your problems. However, the first query will still hit all indices that you specify, that's just the nature of a cache.

I hope that helps.

Daniel

1 Like

after experimenting a bit it seems totally impracticle: i end up with urls of 370chars+ and 404 errors all the time

maybe I'm missing something here:

if I do a query now-7d until now, I still need to query mylogs-* to cover all indices, otherwise I may not be querying all the data, right?

then if I want to select the range of indices myself, to narrow it down, I end up with an index that looks like:
mylogs-2016.07.20,mylogs-2016.07.21,mylogs-2016.07.22,mylogs-2016.07.23,mylogs-2016.07.24,mylogs-2016.07.25,mylogs-2016.07.26,mylogs-2016.07.27

(my index name is actually already much longer as it include 2x 17chars ids back to back + the date)

So the URL becomes a monster...

is that what I'm supposed to do?

Hi @streamn,

yes, the URL can become quite long. Now you could do all sorts of fancy logic on the client to select the relevant indices but to be honest I am not sure this is really worth the effort.

As your actual goal is to speed up your queries (I guess), you could install the latest Elasticsearch 5.0 pre-release version in a test environment and just query over all indices (I've mentioned that we are able to cache large parts of range queries including "now"). Then you can still decide whether the speedup is significant enough for you so it pays off to migrate to 5.0 when it is out.

Daniel