Perform search on ES but limit to hot nodes only?

smlbiobot · November 26, 2020, 2:30pm

We have employed a hot-warm architecture in our cluster, and the indices share the same prefix. Right now, to keep our process sane, we do this to our indices:

Reindex the original index named name-YYYY.MM into a new index called name--YYYY.MM
Shrink the new index
Force merge the new index
When done, move the name--YYYY.MM index into the warm node.

We did it this way so that we can still very easily search against the warm nodes by selecting index='name-*'. However, sometimes we would like to search against the same index on the hot nodes only.

Is there a way to parse the node type parameter to the search? Right now we have partially gotten around this by using index='name-YYYY*' but if there is in fact a better way, one where I can specific node attributes when performing a search?

(I obviously realize that I could’ve named the new index differently but when we made this naming convention originally, we didn’t consider this particular use case)

DavidTurner · November 26, 2020, 2:54pm

Does your data include a timestamp field? If so, I would recommend simply searching everything and using a range filter on the timestamp field. Elasticsearch will automatically (and efficiently) skip any shards that don't contain data matching a timestamp range.

smlbiobot · November 26, 2020, 3:07pm

Yes we have a time stamp field but when I look at the the search r/s in monitoring, it seems like that the indices still get hit?

If what you are saying is that it will simply bypass the node then sure I can definitely do that. Thanks for your help!

DavidTurner · November 26, 2020, 3:24pm

The search has to "hit" one copy of every shard at least once -- it clearly needs to check the timestamp range to know whether to skip that shard or not. But that's all it does if the shard doesn't match, so it's very quick.

IIRC this behaviour is reflected in the profile API output - you shouldn't see profiler entries for the skipped shards since there's basically nothing to profile.

egalpin · November 26, 2020, 6:58pm

It may be worth mentioning that the shards might not be pre-filtered. Based on the documentation (and I've experienced instances first hand), pre-filtering only happens when one of a few conditions is met, one of which is that the number of shards that a search targets is ≥128 [docs].

@smlbiobot You might try adding pre_filter_shard_size=1 to your URL parameters for your search in addition to the time range filter suggested by @DavidTurner.

It's also very possible that I've misunderstood the conditions under which pre-filtering applies, and if so I'd love if someone could correct my understanding!

DavidTurner · November 26, 2020, 9:14pm

Yes that's true, although the general advice remains: just search everything with an appropriate filter and let Elasticsearch decide how best to execute the search. If you're not searching very many shards then pre-filtering doesn't happen because it's just as fast to query them all. You can try pre_filter_shard_size=1 if you like but it may not make any measurable difference or may even make it slightly slower.

egalpin · November 27, 2020, 7:41pm

@DavidTurner Thanks for the response. For my own understanding, would that depend on the complexity of the query and size of the shards? Is there a baseline recommendation in terms of pre_filter overhead? Thanks!

DavidTurner · November 27, 2020, 8:00pm

Not really no, it only helps if you're going to hit a lot of shards that don't match due to a timestamp range filter. The recommendation is not to set this argument so that the default logic applies.

smlbiobot · November 27, 2020, 8:16pm

Our specific use case for this is about 30 indices (so far. It‘s monthly so I expect this to grow) on the warm nodes set to 1 shard 1 replica (shrunk from 5 shards 1 replica on the hot nodes)

Though we are planning to use this strategy for other indices on the cluster so thank you both for your conversation as they are helpful.

system · December 25, 2020, 8:16pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Restrict cold nodes queries Elasticsearch ilm-index-lifecycle-management , rollups	3	598	March 13, 2021
A kibana query with hot-warm-cold node Kibana	4	1082	November 1, 2019
Kibana search for last 24h wait for warm nodes Elasticsearch	7	1533	February 16, 2017
[SOLVED] Request against hot and cold node Elasticsearch	3	1512	July 5, 2017
In real-time search with a long tail of history, how do you avoid hot nodes/shards? Elasticsearch	1	574	July 6, 2017

Perform search on ES but limit to hot nodes only?

Related topics