ILM, content base, re-indexing

wil93 · June 13, 2023, 10:46am

Hello,

I'm looking for some good advice here (and I do know that the standard answer is ' it depends ' which is a justified answer BTW, no worries ...

The situation / challenge:

a large index: we're about to have a quite large index, eventually ending up with like 15+ million source docs (+ growing) ;
each doc can have a lot of fields (and a portion of those are even 'dynamically added') - currently were running again the 1000 fields boundary;
anyways: all docs/data has historical (and potentially juridical) value and needs to be kept for years, and to be searchable ;
we are executing a 'full text query' on almost all fields - a google-like search (which should be possible imho ) ;

Now, there are already like 1.3 mio docs in the current index and that search runs for 7-10 seconds ... . This is relatively 'slow' for an end-user perspective.

To increase search-speed, I've played with several 'number of shards - number of replica's' configuration. No real progress there.

I do know that the setup/config of the cluster also plays a crucial role in performance.
We currently have a rather basic setup (3-node cluster, all nodes have the same roles, etc), but a new (more realistic) setup is on it's way (more nodes and specific roles for them etc) - so that probably will also help ...

My question is though: I'm thinking about splitting my (single) index in multiple related (via aliases) indices. And try to take some kind of 'hot-warm-cold ...' ILM approach ... **but the challenge here is: ** : is it possible to triggere re-indexing via ILM based in specific content of document-fields (e.g. we have a 'status' of a doc like 'open' or 'closed'. Also a 'last updated' data, and e.g. I would like to (ILM-dynamically) move the older and/or closed ones to a 'cold-er' index.
Certain queries cold them target certain indices. General queries (expected to be slower) could use the 'alias' approach.

Any advice will do here

Thanks in advance!

leandrojmp · June 13, 2023, 1:31pm

Not possible, ILM works on entire indices/data-streams, it doesn't look at documents.

Sean_Story · June 13, 2023, 1:57pm

Your question is tagged with "Elastic Enterprise Search" and "Elastic App Search" but you're only talking about indices - are you using the Elastic App Search product? Do you have an Engine?

I'm going to assume not (mistagging happens all the time, don't worry about it), in which case it sounds like your issue is something that could be addressed at ingest time, rather than through an Elasticsearch mechanism. Define your indicies for hot/warm/cold/frozen, and then when you are making updates to your documents, if you're "closing" the document, add it to a colder tier and delete it from your warmer tier. And just run a reindex query to bulk move all the documents that are currently in hot/warm that you want in cold/frozen.

If you really want to have Elasticsearch do this for you, you could look at using Watcher to execute Index or Webhook actions to interact with documents that meet your criteria.

wil93 · June 14, 2023, 6:36am

Hi Leandro,
Honestly, I already also though so.
So, it confirms my suspicions
Thanks,
Wim.

wil93 · June 14, 2023, 6:42am

Hi Sean,
Yes, it should be tagged "Elastic Enterprise Search", sorry.
For the content-based ILM (wish); I also thought I have to do it myself (=add this kind of strategy in my business code ... create-delete ...).
But I'll definitely have a look at your hints here (in particular the 'Watcher')
...
Thanks!

system · July 12, 2023, 6:42am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Using ILM for huge size of indexes Elasticsearch ilm-index-lifecycle-management	17	639	March 27, 2023
ILM Performance in search Elasticsearch ilm-index-lifecycle-management	7	620	July 24, 2022
ILM questions2 Elasticsearch ilm-index-lifecycle-management	3	379	February 21, 2020
ILM With Index Sorting Elasticsearch ilm-index-lifecycle-management	1	156	August 8, 2023
Indexing best practice Elasticsearch	4	447	December 23, 2020

ILM, content base, re-indexing

Related topics