Hello,
I'm looking for some good advice here (and I do know that the standard answer is ' it depends ' which is a justified answer BTW, no worries ...
The situation / challenge:
- a large index: we're about to have a quite large index, eventually ending up with like 15+ million source docs (+ growing) ;
- each doc can have a lot of fields (and a portion of those are even 'dynamically added') - currently were running again the 1000 fields boundary;
- anyways: all docs/data has historical (and potentially juridical) value and needs to be kept for years, and to be searchable ;
- we are executing a 'full text query' on almost all fields - a google-like search (which should be possible imho ) ;
Now, there are already like 1.3 mio docs in the current index and that search runs for 7-10 seconds ... . This is relatively 'slow' for an end-user perspective.
To increase search-speed, I've played with several 'number of shards - number of replica's' configuration. No real progress there.
I do know that the setup/config of the cluster also plays a crucial role in performance.
We currently have a rather basic setup (3-node cluster, all nodes have the same roles, etc), but a new (more realistic) setup is on it's way (more nodes and specific roles for them etc) - so that probably will also help ...
My question is though: I'm thinking about splitting my (single) index in multiple related (via aliases) indices. And try to take some kind of 'hot-warm-cold ...' ILM approach ... **but the challenge here is: ** : is it possible to triggere re-indexing via ILM based in specific content of document-fields (e.g. we have a 'status' of a doc like 'open' or 'closed'. Also a 'last updated' data, and e.g. I would like to (ILM-dynamically) move the older and/or closed ones to a 'cold-er' index.
Certain queries cold them target certain indices. General queries (expected to be slower) could use the 'alias' approach.
Any advice will do here
Thanks in advance!