We have built a ES system using time based indexes (yearly, monthly, weekly) for different clients, we have followed standard pattern to define the names. (similar to logstash, but more complex than that).
We get all the indexes from ES & filter-out based on name while searching, we did not wanted to make use of aliases (as we are book keeping the searches with other non-ES context information), we want to know the indexes queried at any given point of time & is also that limit the indexes searched based on time search criteria & other context information.
Currently we found that http://:9200/_stats/indices/ doesn't include the nodes that were down, we would need search to fail in those cases. Apparently http://:9200/_aliases API does include the nodes (indexes) that were down as well.
Whats the preferred API to get all indexes.
These APIs are looks to be fast & using cache, Is really makes sense to cache these information.
We have been obsering stats/ API behaving (using native java TransportClient API) in inconsistent manner, we don't get all the indexes @ all.
We are thinking to maintain client Vs Indexes in separate datastore. It would be extra work to sync up these with ES.
It will be nice if any one can share the best practices dealing with such large indexes.