Alias with multiple index or One Index


(weibin.wu) #1

Hi Elasticsearch:

I have a use case. That I have a time-based log store in ES. There are two plans to store it.

[Store in one index] Plan A
I will only create one index to store the log.

[Store in daily index and use alias] Plan B
Daily create a new index to store the time-based log. And setup alias to include all the daily indices.

In terms of query, in Plan B I will need to query an alias which has 365 indices behind for a year. For plan A I only need to query in one index.

Can I know when I query an alias, what is the strategy of an alias to send the request to different indices?
In terms of querying from PLAN A and PLAN B for 1 year data, which one is faster?
Thanks


(Mark Walkom) #2

Not plan b as you will essentially query all the indices, which is what plan a is as well.

How are you querying the indices?


(Christian Dahlqvist) #3

Using time-based indices has a number of benefits as outlined in the link I provided. You also do not need an alias to query them - specifying an index pattern, e.g.logstash-*, will do. It is also important to note that time-based indices does not have to be daily. If you have a long retention period it may be better to have monthly indices. Time-based indices will allow you to adapt to changing volumes more easily and makes it easier to do capacity planning as you can change the number of shards for the next moths index if your volumes grow.

Query latency depends on the data, queries as well as the size and number of shards. This video discusses sizing of shards and clusters and may be useful.


(weibin.wu) #4

For example, I want to query the data on 1st May of 2017.

For Plan A, the request only need to send to one index and query there.
For Plan B, how do the alias know which index the request should send to? Because there are 365 indices behind the alias. If the request will send to all the indices at the same time, is the scanning of each index parallel or one by one?


(Christian Dahlqvist) #5

Indices and shards are queried in parallel. As you know which indices that hold data for specific periods, you can limit yourself to those indices already when you send the query. Kibana does this automatically by first calling the field stats API to determine exactly which indices to query. Even if you query the whole data set, indices that hold no data for the requested time period will return quite quickly.


(system) #6

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.