Hi,
We're seeing significant fluctuations in index sizes in our log aggregation cluster for old indices that are no longer being updated. For example, the index for Dec 12th doubled in size in the days after its last update before reducing in size by around 15%.
Can someone help explain what is/may be going on here, how we can track the processes at work and what we can do to deal with it?
Hi,
The thing is we optimise all of our indices after 1 day and the index I'm referring to here was growing after 3 days. Ended up being 2.2TB in size based on around 500 million docs before settling back to 1.7TB. When the last write came in the index was around 950GB!
BTW, is there a way of measuring the sparsity of our field data? Would I need to run lucene commands to do that? We're running on 2.1.2...
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.