Finding which indices contain hits when making use of an alias in elasticsearch query

sandeepkanabar · September 14, 2020, 2:15am

I've 1000s of day-wise (time-based) indices in my cluster. Is there a way to monitor or find out at the ES Server side which indices return data when a query makes use of an alias . E.g. consider the below query that my customer fires:

GET index_alias/_search?q=Product_Serial_Num:ABCD

The alias index_alias is mapped to all the day-wise indices in the ES 6.8.6 cluster and they total 1200. The reason being the query is used to fetch historical records for that Product_Serial_Num. Thus it is not possible to know beforehand which day-wise index would have data. That's the reason the index_alias is mapped to all indices. When the customer fires the above query, I would like to monitor this at the ES Server side and find out which indices actually return the data when the query hits them. Is there a way to determine this without modifying anything at the customer end?

The purpose of this is to figure out the "which indices" are being hit the most, like last-n-days (30, 60, 90) and then apply hot-warm architecture accordingly like having last 60 days indices on hot nodes and rest on warm nodes.

Or if there are better ways to determine, please do share.

Thanks.

Steve_Mushero · September 14, 2020, 4:02am

Might I suggest that thousands of indices is usually not a good idea. I hope you have lots of Heap in your Master

Why not use index stats to watch your indices and maybe shards, and use that for hot/warm? Why do you care about the shard, anyway, as any hot-warm will be index-level and most loads are balanced across shards (though not all).

I can't see how you can get much data on a query if you cannot modify it.

sandeepkanabar · September 14, 2020, 4:30am

Thanks @Steve_Mushero for the answer. I've edited my question and description to make it more clear.

I'm basically looking for a way to monitor the queries at the ES Server side and be able to determine which indices are being hit.

The reason for 1000s of indices is to preserve historical data but indexing happens only to today's index. Rest all indices are read-only.

Steve_Mushero · September 14, 2020, 4:33am

I'm basically looking for a way to monitor the queries at the ES Server side and be able to determine which indices are being hit.

I would still think overall index stats would be the best way, just watch queries or results over time to see which are being used. I'd have to dig into the stats again for more specifics - we added some top index features to our tools, though I don't think released yet (I'm not with Elastic).

The reason for 1000s of indices is to preserve historical data but indexing happens only to today's index. Rest all indices are read-only.

Still a lot and usually people run into Master Heap and other issues as this grows.

sandeepkanabar · September 14, 2020, 4:38am

Yes. I was thinking to monitor the results of the queries. Would appreciate some recommendations here.

Agree. Which is why want to implement hot-warm arch and then split the alias too into hot and warm alias i.e. the hot alias will hit only hot indices.

Christian_Dahlqvist · September 14, 2020, 5:18am

Please read this blog post as having large number of small indices and shards can be very inefficient and slow. It is not only heap usage that is affected as the cluster state will grow and get slower to update and propagate, which can also cause a lot of problems.

sandeepkanabar · September 14, 2020, 8:09am

Hey @Christian_Dahlqvist - yes. fully aware of that blog post and have read it. In my case, each of the day-wise indices are ~30 to 35 GB in size with 1 shard and 1 replica. So far, don't have the issue of too many small shards.

Since I want to implement hot/warm architecture, I'm looking to find the recent "n" indices that should be kept in hot nodes and to mark the remaining indices are Read-only and move them to warm nodes.

There are no indexing issues as of now since indexing only happens to today's index. Rest all indices are RO. However, the search is slow because the customer uses alias and hence it hits all indices (shards). If I'm able to determine the recent "n" indices, then I can map the alias to only those "n" indices and create another alias that will map to warm indices.

Christian_Dahlqvist · September 14, 2020, 8:23am

How many indices are you indexing into in parallel? What is your retention period? How many aliases do you have and how are these aligned with the indices?

sandeepkanabar · September 14, 2020, 8:52am

Indexing only 1 index at a time which is today's index e.g. foo_index-2020.09.14. Retention period is 7 years because these are kept for historical purposes.

Have a single alias foo_alias that maps to 1200 day-wise indices pertaining to last 4 years. Customer fires a query like GET foo_alias/_search?q=fieldname:value that hits all indices currently.

Plan to introduce hot-warm arch and change foo_alias to point to n recent indices. The remaining indices will be mapped to a new alias foo_alias_warm. If customer fires a query GET foo_alias/_search?q=fieldname:value that will hit only n recent indices. If the no of hits is 0, then another query GET foo_alias_warm/_search?q=fieldname:value will get fired.

borna_talebi · September 15, 2020, 6:51am

Hi, I'm not sure if I understand your question correctly but if you just want to find out which indices returned hits just use a term aggregation on "_index" field. something like :

"aggs": {
            "groupbyindex": {
               "terms": {"field": "_index"}
              }
         },

and then by using "doc_count" you can see which indices are being hit the most

sandeepkanabar · September 15, 2020, 7:27am

Thank you @borna_talebi for your inputs. My question is more like: If my customer is firing the query and I want to monitor this on ES server side - which indices it hits the most, then how do I go about it?

system · October 13, 2020, 7:27am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Question about aliases Elasticsearch	2	230	July 6, 2017
Find out which indices are used most frequently Elasticsearch	3	499	July 5, 2017
How many shards are hit when using day-wise indices Elasticsearch	7	552	January 3, 2019
Switching from index names to alias - performance impact? Elasticsearch	5	1104	February 15, 2022
Managing Index metadata outside the ES Elasticsearch	1	330	July 6, 2017

Finding which indices contain hits when making use of an alias in elasticsearch query

Related topics