Hello,
we are running an 9 machine Elasticsearch cluster at the company I work at. This cluster is running Elasticsearch 1.7.3. We have two indices in this cluster, one using 4.3GB and one using 312GB of disk.
Last week we started a task force to upgrade our cluster to Elasticsearch 2.4.3, so we've deployed a new cluster running the new version of Elasticsearch, and configured a Logstash instance to consume from our S3 backup and index the documents in your new cluster. The larger index is using 950GB and the smaller one is using 6.9GB. The document count for both indices remain really similar, less than 100 000 documents different for a 10 billion document index.
The only changes we've made in the indices mappings was changing from "index_analyzer" to "analyzer" in some fields, due to "index_analyzer" not beeing a valid configuration anymore.
Do this disk usage increment make any sense? Is newer versions of ES consuming more disk thank before?
Thanks in advance.