I have an ES cluster where we index the syslogs from multiple devices.
I had a request from one of our customer to know the size of the syslogs on in the ES cluster, by devices.
The goal is to be able to make forcast on the disk usage when adding new devices.
I saw the indice stat API with _stat and store but is it possible to be more specific with the occupied space by a subset of document?
Not really. The trouble is that the index's size comes from a bunch of factors and that can't be filtered. For example the terms dictionary only has one copy of each term per segment. So you'll end up with lots of duplication reduction. You can estimate it by figuring out the portion of the index that matches the query and just doing the multiplication but it won't be accurate. You can also build an index with just those documents using the scroll API and get the size of that. That'll probably overestimate the size.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.