Perform search on ES but limit to hot nodes only?

We have employed a hot-warm architecture in our cluster, and the indices share the same prefix. Right now, to keep our process sane, we do this to our indices:

  • Reindex the original index named name-YYYY.MM into a new index called name--YYYY.MM
  • Shrink the new index
  • Force merge the new index
  • When done, move the name--YYYY.MM index into the warm node.

We did it this way so that we can still very easily search against the warm nodes by selecting index='name-*'. However, sometimes we would like to search against the same index on the hot nodes only.

Is there a way to parse the node type parameter to the search? Right now we have partially gotten around this by using index='name-YYYY*' but if there is in fact a better way, one where I can specific node attributes when performing a search?

(I obviously realize that I could’ve named the new index differently but when we made this naming convention originally, we didn’t consider this particular use case)

Does your data include a timestamp field? If so, I would recommend simply searching everything and using a range filter on the timestamp field. Elasticsearch will automatically (and efficiently) skip any shards that don't contain data matching a timestamp range.

1 Like

Yes we have a time stamp field but when I look at the the search r/s in monitoring, it seems like that the indices still get hit?

If what you are saying is that it will simply bypass the node then sure I can definitely do that. Thanks for your help!

The search has to "hit" one copy of every shard at least once -- it clearly needs to check the timestamp range to know whether to skip that shard or not. But that's all it does if the shard doesn't match, so it's very quick.

IIRC this behaviour is reflected in the profile API output - you shouldn't see profiler entries for the skipped shards since there's basically nothing to profile.

1 Like

It may be worth mentioning that the shards might not be pre-filtered. Based on the documentation (and I've experienced instances first hand), pre-filtering only happens when one of a few conditions is met, one of which is that the number of shards that a search targets is ≥128 [docs].

@smlbiobot You might try adding pre_filter_shard_size=1 to your URL parameters for your search in addition to the time range filter suggested by @DavidTurner.

It's also very possible that I've misunderstood the conditions under which pre-filtering applies, and if so I'd love if someone could correct my understanding!

Yes that's true, although the general advice remains: just search everything with an appropriate filter and let Elasticsearch decide how best to execute the search. If you're not searching very many shards then pre-filtering doesn't happen because it's just as fast to query them all. You can try pre_filter_shard_size=1 if you like but it may not make any measurable difference or may even make it slightly slower.

1 Like

@DavidTurner Thanks for the response. For my own understanding, would that depend on the complexity of the query and size of the shards? Is there a baseline recommendation in terms of pre_filter overhead? Thanks!

Not really no, it only helps if you're going to hit a lot of shards that don't match due to a timestamp range filter. The recommendation is not to set this argument so that the default logic applies.

1 Like

Our specific use case for this is about 30 indices (so far. It‘s monthly so I expect this to grow) on the warm nodes set to 1 shard 1 replica (shrunk from 5 shards 1 replica on the hot nodes)

Though we are planning to use this strategy for other indices on the cluster so thank you both for your conversation as they are helpful.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.