Disk space for the Lucene index?
Disk space for the _source data?
Disk space for logs and other metadata?
Does it count shards?
Does it count replicas?
Anything else?
I know this question has been asked multiple times, but I have not been
able to find a succinct breakdown of what is exactly involved in the
calculation. The actual equation would be even better!
from a quick peek in the source, the StoreStats are generated in
Store.stats(), which uses the Lucene Index Directory to get its size. Which
again calls file.length() for each file in that directory in the end. So it
is the size used by a lucene index in bytes.
The indices stats API shows the data for total shards, for primaries only
or for replicas and the same per index - so you can decide which data is
important to you to count in your monitoring system.
Disk space for the Lucene index?
Disk space for the _source data?
Disk space for logs and other metadata?
Does it count shards?
Does it count replicas?
Anything else?
I know this question has been asked multiple times, but I have not been
able to find a succinct breakdown of what is exactly involved in the
calculation. The actual equation would be even better!
On Thu, Jan 16, 2014 at 2:38 AM, Alexander Reelsen alr@spinscale.de wrote:
Hey,
from a quick peek in the source, the StoreStats are generated in
Store.stats(), which uses the Lucene Index Directory to get its size. Which
again calls file.length() for each file in that directory in the end. So it
is the size used by a lucene index in bytes.
The indices stats API shows the data for total shards, for primaries only
or for replicas and the same per index - so you can decide which data is
important to you to count in your monitoring system.
Disk space for the Lucene index?
Disk space for the _source data?
Disk space for logs and other metadata?
Does it count shards?
Does it count replicas?
Anything else?
I know this question has been asked multiple times, but I have not been
able to find a succinct breakdown of what is exactly involved in the
calculation. The actual equation would be even better!
I did more digging. Turns out that using version 0.90.9, the _source data
is included in the calculation. In other words, the stats are the entire
disk space used by an index including source data. And it is broken down by
indices, primaries, etc as Alex said.
I did not test to see if it takes into account source data compression, but
it appears that it does.
the source is just a field in the index, thats the reason for being
included. What is not included is the something like the translog, so it is
not the entire disk space used by an index is in there iirc.
I did more digging. Turns out that using version 0.90.9, the _source data
is included in the calculation. In other words, the stats are the entire
disk space used by an index including source data. And it is broken down by
indices, primaries, etc as Alex said.
I did not test to see if it takes into account source data compression,
but it appears that it does.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.