ILM, content base, re-indexing


I'm looking for some good advice here (and I do know that the standard answer is ' it depends ' :slight_smile: which is a justified answer BTW, no worries ...

The situation / challenge:

  • a large index: we're about to have a quite large index, eventually ending up with like 15+ million source docs (+ growing) ;
  • each doc can have a lot of fields (and a portion of those are even 'dynamically added') - currently were running again the 1000 fields boundary;
  • anyways: all docs/data has historical (and potentially juridical) value and needs to be kept for years, and to be searchable ;
  • we are executing a 'full text query' on almost all fields - a google-like search (which should be possible imho ) ;

Now, there are already like 1.3 mio docs in the current index and that search runs for 7-10 seconds ... . This is relatively 'slow' for an end-user perspective.

To increase search-speed, I've played with several 'number of shards - number of replica's' configuration. No real progress there.

I do know that the setup/config of the cluster also plays a crucial role in performance.
We currently have a rather basic setup (3-node cluster, all nodes have the same roles, etc), but a new (more realistic) setup is on it's way (more nodes and specific roles for them etc) - so that probably will also help ...

My question is though: I'm thinking about splitting my (single) index in multiple related (via aliases) indices. And try to take some kind of 'hot-warm-cold ...' ILM approach ... **but the challenge here is: ** : is it possible to triggere re-indexing via ILM based in specific content of document-fields (e.g. we have a 'status' of a doc like 'open' or 'closed'. Also a 'last updated' data, and e.g. I would like to (ILM-dynamically) move the older and/or closed ones to a 'cold-er' index.
Certain queries cold them target certain indices. General queries (expected to be slower) could use the 'alias' approach.

Any advice will do here :slight_smile:

Thanks in advance!

Not possible, ILM works on entire indices/data-streams, it doesn't look at documents.

Your question is tagged with "Elastic Enterprise Search" and "Elastic App Search" but you're only talking about indices - are you using the Elastic App Search product? Do you have an Engine?

I'm going to assume not (mistagging happens all the time, don't worry about it), in which case it sounds like your issue is something that could be addressed at ingest time, rather than through an Elasticsearch mechanism. Define your indicies for hot/warm/cold/frozen, and then when you are making updates to your documents, if you're "closing" the document, add it to a colder tier and delete it from your warmer tier. And just run a reindex query to bulk move all the documents that are currently in hot/warm that you want in cold/frozen.

If you really want to have Elasticsearch do this for you, you could look at using Watcher to execute Index or Webhook actions to interact with documents that meet your criteria.

Hi Leandro,
Honestly, I already also though so.
So, it confirms my suspicions :slight_smile:

Hi Sean,
Yes, it should be tagged "Elastic Enterprise Search", sorry.
For the content-based ILM (wish); I also thought I have to do it myself (=add this kind of strategy in my business code ... create-delete ...).
But I'll definitely have a look at your hints here (in particular the 'Watcher') :+1:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.