Setting index false increases the index storage size

Hi,
In an effort to reduce document storage size, I added the index false property to several fields in my index mapping template for fields of type text, keyword and numerical types.
To measure how much storage space is saved, I created a shadow index and reindexed documents from an existing index into my shadow index.
When I looked at the shadow index for almost the same number of documents, the shadow index was almost double the storage size as the original index!
I also tried a reindex on the exact same index without these flags set and the numbers were comparable, so it doesn't look like a reindex issue.

green open original-2020.05  xxxx  5 1   49278 0 171.2mb  85.4mb
green open shadow-2020.05    xxxx  5 1   48695 0 289.4mb 144.7mb
green open reindex-2020.05   xxxx  5 1   49275 0   174mb  87.1mb

Do you guys have any idea on why this is the case? I thought this was supposed to be reducing my disk size.

Welcome!

Did you call the Force Merge API on all your indices to make sure you are comparing similar things?

If not, run:

POST /original-2020.05,shadow-2020.05,reindex-2020.05/_forcemerge?max_num_segments=1
1 Like

Hi David,
After running a force segment merge on the shadow and re-index the numbers look like this

green open shadow-2020.05    xxxx  5 1   48695 0 270.4mb 135.2mb        <--- slight reduction
green open reindex-2020.05   xxxx  5 1   49275 0 210.2mb   110mb        <--- actually increased

Not really sure why this is the case. Can you set index false for fields that are of type text etc? Is there any adverse impact of this?
Normally my indexes are managed by templates with ILM, so when I move from the hot to a warm phase, I automatically perform a forcemerge. I was following the guidelines specified in the article on tuning ES to reduce disk storage.

I just realized that you have 5 shards. Could you test with one single shard?

Also with which version was created the index original-2020.05?

Could you run:

GET original-2020.05

All three indexes have 5 shards and they were all created using index templates. By version do you mean ES version? They were are created with 6.8 on the same ES cluster.

Yes. But could you try with one single shard for the target?

Ok. So that seems to eliminate some theories like the Lucene version which was different.

But could you run:

GET reindex-2020.05,original-2020.05/_stats?include_segment_file_sizes=true

Since my original index has live data being pumped into it, I have a comparison point for the shadow and reindex instead

I am hitting some character limits here. So I have pruned both the outputs to just have the totals and not the primaries. Is that sufficient information?
For a single shard comparison, I will have to reindex everything again.

Shadow

{
  "_all": {
    "total": {
      "docs": {
        "count": 97390,
        "deleted": 0
      },
      "store": {
        "size_in_bytes": 283585664
      },
      "indexing": {
        "index_total": 97390,
        "index_time_in_millis": 118955,
        "index_current": 0,
        "index_failed": 0,
        "delete_total": 0,
        "delete_time_in_millis": 0,
        "delete_current": 0,
        "noop_update_total": 0,
        "is_throttled": false,
        "throttle_time_in_millis": 0
      },
      "get": {
        "total": 0,
        "time_in_millis": 0,
        "exists_total": 0,
        "exists_time_in_millis": 0,
        "missing_total": 0,
        "missing_time_in_millis": 0,
        "current": 0
      },
      "search": {
        "open_contexts": 0,
        "query_total": 0,
        "query_time_in_millis": 0,
        "query_current": 0,
        "fetch_total": 0,
        "fetch_time_in_millis": 0,
        "fetch_current": 0,
        "scroll_total": 0,
        "scroll_time_in_millis": 0,
        "scroll_current": 0,
        "suggest_total": 0,
        "suggest_time_in_millis": 0,
        "suggest_current": 0
      },
      "merges": {
        "current": 0,
        "current_docs": 0,
        "current_size_in_bytes": 0,
        "total": 20,
        "total_time_in_millis": 95563,
        "total_docs": 167822,
        "total_size_in_bytes": 541176345,
        "total_stopped_time_in_millis": 0,
        "total_throttled_time_in_millis": 0,
        "total_auto_throttle_in_bytes": 209715200
      },
      "refresh": {
        "total": 254,
        "total_time_in_millis": 60697,
        "listeners": 0
      },
      "flush": {
        "total": 10,
        "periodic": 0,
        "total_time_in_millis": 358
      },
      "warmer": {
        "current": 0,
        "total": 210,
        "total_time_in_millis": 1896
      },
      "query_cache": {
        "memory_size_in_bytes": 0,
        "total_count": 0,
        "hit_count": 0,
        "miss_count": 0,
        "cache_size": 0,
        "cache_count": 0,
        "evictions": 0
      },
      "fielddata": {
        "memory_size_in_bytes": 0,
        "evictions": 0
      },
      "completion": {
        "size_in_bytes": 0
      },
      "segments": {
        "count": 10,
        "memory_in_bytes": 3143406,
        "terms_memory_in_bytes": 3068474,
        "stored_fields_memory_in_bytes": 19096,
        "term_vectors_memory_in_bytes": 0,
        "norms_memory_in_bytes": 2560,
        "points_memory_in_bytes": 15332,
        "doc_values_memory_in_bytes": 37944,
        "index_writer_memory_in_bytes": 0,
        "version_map_memory_in_bytes": 0,
        "fixed_bit_set_memory_in_bytes": 0,
        "max_unsafe_auto_id_timestamp": -1,
        "file_sizes": {
          "dii": {
            "size_in_bytes": 1990,
            "description": "Points"
          },
          "tip": {
            "size_in_bytes": 2937148,
            "description": "Term Index"
          },
          "dvm": {
            "size_in_bytes": 135664,
            "description": "DocValues"
          },
          "fdx": {
            "size_in_bytes": 16942,
            "description": "Field Index"
          },
          "nvm": {
            "size_in_bytes": 1990,
            "description": "Norms"
          },
          "tim": {
            "size_in_bytes": 140820805,
            "description": "Term Dictionary"
          },
          "nvd": {
            "size_in_bytes": 195370,
            "description": "Norms"
          },
          "si": {
            "size_in_bytes": 5530,
            "description": "Segment Info"
          },
          "doc": {
            "size_in_bytes": 30315652,
            "description": "Frequencies"
          },
          "dvd": {
            "size_in_bytes": 15103134,
            "description": "DocValues"
          },
          "dim": {
            "size_in_bytes": 10354356,
            "description": "Points"
          },
          "fdt": {
            "size_in_bytes": 33126909,
            "description": "Field Data"
          },
          "fnm": {
            "size_in_bytes": 129560,
            "description": "Fields"
          },
          "pos": {
            "size_in_bytes": 50437284,
            "description": "Positions"
          }
        }
      },
      "translog": {
        "operations": 0,
        "size_in_bytes": 550,
        "uncommitted_operations": 0,
        "uncommitted_size_in_bytes": 550,
        "earliest_last_modified_age": 0
      },
      "request_cache": {
        "memory_size_in_bytes": 0,
        "evictions": 0,
        "hit_count": 0,
        "miss_count": 0
      },
      "recovery": {
        "current_as_source": 0,
        "current_as_target": 0,
        "throttle_time_in_millis": 0
      }
    }
  },
  "indices": {
    "shadow": {
      "uuid": "SzfIJpThSVKzEcyv553Shg",
      "total": {
        "docs": {
          "count": 97390,
          "deleted": 0
        },
        "store": {
          "size_in_bytes": 283585664
        },
        "indexing": {
          "index_total": 97390,
          "index_time_in_millis": 118955,
          "index_current": 0,
          "index_failed": 0,
          "delete_total": 0,
          "delete_time_in_millis": 0,
          "delete_current": 0,
          "noop_update_total": 0,
          "is_throttled": false,
          "throttle_time_in_millis": 0
        },
        "get": {
          "total": 0,
          "time_in_millis": 0,
          "exists_total": 0,
          "exists_time_in_millis": 0,
          "missing_total": 0,
          "missing_time_in_millis": 0,
          "current": 0
        },
        "search": {
          "open_contexts": 0,
          "query_total": 0,
          "query_time_in_millis": 0,
          "query_current": 0,
          "fetch_total": 0,
          "fetch_time_in_millis": 0,
          "fetch_current": 0,
          "scroll_total": 0,
          "scroll_time_in_millis": 0,
          "scroll_current": 0,
          "suggest_total": 0,
          "suggest_time_in_millis": 0,
          "suggest_current": 0
        },
        "merges": {
          "current": 0,
          "current_docs": 0,
          "current_size_in_bytes": 0,
          "total": 20,
          "total_time_in_millis": 95563,
          "total_docs": 167822,
          "total_size_in_bytes": 541176345,
          "total_stopped_time_in_millis": 0,
          "total_throttled_time_in_millis": 0,
          "total_auto_throttle_in_bytes": 209715200
        },
        "refresh": {
          "total": 254,
          "total_time_in_millis": 60697,
          "listeners": 0
        },
        "flush": {
          "total": 10,
          "periodic": 0,
          "total_time_in_millis": 358
        },
        "warmer": {
          "current": 0,
          "total": 210,
          "total_time_in_millis": 1896
        },
        "query_cache": {
          "memory_size_in_bytes": 0,
          "total_count": 0,
          "hit_count": 0,
          "miss_count": 0,
          "cache_size": 0,
          "cache_count": 0,
          "evictions": 0
        },
        "fielddata": {
          "memory_size_in_bytes": 0,
          "evictions": 0
        },
        "completion": {
          "size_in_bytes": 0
        },
        "segments": {
          "count": 10,
          "memory_in_bytes": 3143406,
          "terms_memory_in_bytes": 3068474,
          "stored_fields_memory_in_bytes": 19096,
          "term_vectors_memory_in_bytes": 0,
          "norms_memory_in_bytes": 2560,
          "points_memory_in_bytes": 15332,
          "doc_values_memory_in_bytes": 37944,
          "index_writer_memory_in_bytes": 0,
          "version_map_memory_in_bytes": 0,
          "fixed_bit_set_memory_in_bytes": 0,
          "max_unsafe_auto_id_timestamp": -1,
          "file_sizes": {
            "doc": {
              "size_in_bytes": 30315652,
              "description": "Frequencies"
            },
            "fdx": {
              "size_in_bytes": 16942,
              "description": "Field Index"
            },
            "dvd": {
              "size_in_bytes": 15103134,
              "description": "DocValues"
            },
            "tim": {
              "size_in_bytes": 140820805,
              "description": "Term Dictionary"
            },
            "dii": {
              "size_in_bytes": 1990,
              "description": "Points"
            },
            "pos": {
              "size_in_bytes": 50437284,
              "description": "Positions"
            },
            "dim": {
              "size_in_bytes": 10354356,
              "description": "Points"
            },
            "nvm": {
              "size_in_bytes": 1990,
              "description": "Norms"
            },
            "dvm": {
              "size_in_bytes": 135664,
              "description": "DocValues"
            },
            "si": {
              "size_in_bytes": 5530,
              "description": "Segment Info"
            },
            "fnm": {
              "size_in_bytes": 129560,
              "description": "Fields"
            },
            "fdt": {
              "size_in_bytes": 33126909,
              "description": "Field Data"
            },
            "tip": {
              "size_in_bytes": 2937148,
              "description": "Term Index"
            },
            "nvd": {
              "size_in_bytes": 195370,
              "description": "Norms"
            }
          }
        },
        "translog": {
          "operations": 0,
          "size_in_bytes": 550,
          "uncommitted_operations": 0,
          "uncommitted_size_in_bytes": 550,
          "earliest_last_modified_age": 0
        },
        "request_cache": {
          "memory_size_in_bytes": 0,
          "evictions": 0,
          "hit_count": 0,
          "miss_count": 0
        },
        "recovery": {
          "current_as_source": 0,
          "current_as_target": 0,
          "throttle_time_in_millis": 0
        }
      }
    }
  }
}

re index

{
  "_all": {
    "total": {
      "docs": {
        "count": 98550,
        "deleted": 0
      },
      "store": {
        "size_in_bytes": 173581057
      },
      "indexing": {
        "index_total": 98550,
        "index_time_in_millis": 62746,
        "index_current": 0,
        "index_failed": 0,
        "delete_total": 0,
        "delete_time_in_millis": 0,
        "delete_current": 0,
        "noop_update_total": 0,
        "is_throttled": false,
        "throttle_time_in_millis": 0
      },
      "get": {
        "total": 0,
        "time_in_millis": 0,
        "exists_total": 0,
        "exists_time_in_millis": 0,
        "missing_total": 0,
        "missing_time_in_millis": 0,
        "current": 0
      },
      "search": {
        "open_contexts": 0,
        "query_total": 0,
        "query_time_in_millis": 0,
        "query_current": 0,
        "fetch_total": 0,
        "fetch_time_in_millis": 0,
        "fetch_current": 0,
        "scroll_total": 0,
        "scroll_time_in_millis": 0,
        "scroll_current": 0,
        "suggest_total": 0,
        "suggest_time_in_millis": 0,
        "suggest_current": 0
      },
      "merges": {
        "current": 0,
        "current_docs": 0,
        "current_size_in_bytes": 0,
        "total": 20,
        "total_time_in_millis": 32364,
        "total_docs": 169942,
        "total_size_in_bytes": 321609387,
        "total_stopped_time_in_millis": 0,
        "total_throttled_time_in_millis": 0,
        "total_auto_throttle_in_bytes": 209715200
      },
      "refresh": {
        "total": 240,
        "total_time_in_millis": 19748,
        "listeners": 0
      },
      "flush": {
        "total": 10,
        "periodic": 0,
        "total_time_in_millis": 293
      },
      "warmer": {
        "current": 0,
        "total": 199,
        "total_time_in_millis": 1243
      },
      "query_cache": {
        "memory_size_in_bytes": 0,
        "total_count": 0,
        "hit_count": 0,
        "miss_count": 0,
        "cache_size": 0,
        "cache_count": 0,
        "evictions": 0
      },
      "fielddata": {
        "memory_size_in_bytes": 0,
        "evictions": 0
      },
      "completion": {
        "size_in_bytes": 0
      },
      "segments": {
        "count": 10,
        "memory_in_bytes": 476957,
        "terms_memory_in_bytes": 354026,
        "stored_fields_memory_in_bytes": 56736,
        "term_vectors_memory_in_bytes": 0,
        "norms_memory_in_bytes": 8960,
        "points_memory_in_bytes": 19259,
        "doc_values_memory_in_bytes": 37976,
        "index_writer_memory_in_bytes": 0,
        "version_map_memory_in_bytes": 0,
        "fixed_bit_set_memory_in_bytes": 0,
        "max_unsafe_auto_id_timestamp": -1,
        "file_sizes": {
          "fdx": {
            "size_in_bytes": 53731,
            "description": "Field Index"
          },
          "pos": {
            "size_in_bytes": 24245847,
            "description": "Positions"
          },
          "tip": {
            "size_in_bytes": 241800,
            "description": "Term Index"
          },
          "dii": {
            "size_in_bytes": 2230,
            "description": "Points"
          },
          "fnm": {
            "size_in_bytes": 148210,
            "description": "Fields"
          },
          "dvm": {
            "size_in_bytes": 135696,
            "description": "DocValues"
          },
          "si": {
            "size_in_bytes": 5470,
            "description": "Segment Info"
          },
          "nvm": {
            "size_in_bytes": 5290,
            "description": "Norms"
          },
          "dvd": {
            "size_in_bytes": 15266820,
            "description": "DocValues"
          },
          "dim": {
            "size_in_bytes": 13237382,
            "description": "Points"
          },
          "doc": {
            "size_in_bytes": 18783854,
            "description": "Frequencies"
          },
          "nvd": {
            "size_in_bytes": 986090,
            "description": "Norms"
          },
          "fdt": {
            "size_in_bytes": 80145282,
            "description": "Field Data"
          },
          "tim": {
            "size_in_bytes": 20320021,
            "description": "Term Dictionary"
          }
        }
      },
      "translog": {
        "operations": 0,
        "size_in_bytes": 550,
        "uncommitted_operations": 0,
        "uncommitted_size_in_bytes": 550,
        "earliest_last_modified_age": 0
      },
      "request_cache": {
        "memory_size_in_bytes": 0,
        "evictions": 0,
        "hit_count": 0,
        "miss_count": 0
      },
      "recovery": {
        "current_as_source": 0,
        "current_as_target": 0,
        "throttle_time_in_millis": 0
      }
    }
  },
  "indices": {
    "reinde": {
      "uuid": "ipM1MBGTSq2apLx6o79GfQ",
      "total": {
        "docs": {
          "count": 98550,
          "deleted": 0
        },
        "store": {
          "size_in_bytes": 173581057
        },
        "indexing": {
          "index_total": 98550,
          "index_time_in_millis": 62746,
          "index_current": 0,
          "index_failed": 0,
          "delete_total": 0,
          "delete_time_in_millis": 0,
          "delete_current": 0,
          "noop_update_total": 0,
          "is_throttled": false,
          "throttle_time_in_millis": 0
        },
        "get": {
          "total": 0,
          "time_in_millis": 0,
          "exists_total": 0,
          "exists_time_in_millis": 0,
          "missing_total": 0,
          "missing_time_in_millis": 0,
          "current": 0
        },
        "search": {
          "open_contexts": 0,
          "query_total": 0,
          "query_time_in_millis": 0,
          "query_current": 0,
          "fetch_total": 0,
          "fetch_time_in_millis": 0,
          "fetch_current": 0,
          "scroll_total": 0,
          "scroll_time_in_millis": 0,
          "scroll_current": 0,
          "suggest_total": 0,
          "suggest_time_in_millis": 0,
          "suggest_current": 0
        },
        "merges": {
          "current": 0,
          "current_docs": 0,
          "current_size_in_bytes": 0,
          "total": 20,
          "total_time_in_millis": 32364,
          "total_docs": 169942,
          "total_size_in_bytes": 321609387,
          "total_stopped_time_in_millis": 0,
          "total_throttled_time_in_millis": 0,
          "total_auto_throttle_in_bytes": 209715200
        },
        "refresh": {
          "total": 240,
          "total_time_in_millis": 19748,
          "listeners": 0
        },
        "flush": {
          "total": 10,
          "periodic": 0,
          "total_time_in_millis": 293
        },
        "warmer": {
          "current": 0,
          "total": 199,
          "total_time_in_millis": 1243
        },
        "query_cache": {
          "memory_size_in_bytes": 0,
          "total_count": 0,
          "hit_count": 0,
          "miss_count": 0,
          "cache_size": 0,
          "cache_count": 0,
          "evictions": 0
        },
        "fielddata": {
          "memory_size_in_bytes": 0,
          "evictions": 0
        },
        "completion": {
          "size_in_bytes": 0
        },
        "segments": {
          "count": 10,
          "memory_in_bytes": 476957,
          "terms_memory_in_bytes": 354026,
          "stored_fields_memory_in_bytes": 56736,
          "term_vectors_memory_in_bytes": 0,
          "norms_memory_in_bytes": 8960,
          "points_memory_in_bytes": 19259,
          "doc_values_memory_in_bytes": 37976,
          "index_writer_memory_in_bytes": 0,
          "version_map_memory_in_bytes": 0,
          "fixed_bit_set_memory_in_bytes": 0,
          "max_unsafe_auto_id_timestamp": -1,
          "file_sizes": {
            "tip": {
              "size_in_bytes": 241800,
              "description": "Term Index"
            },
            "nvm": {
              "size_in_bytes": 5290,
              "description": "Norms"
            },
            "dii": {
              "size_in_bytes": 2230,
              "description": "Points"
            },
            "fdt": {
              "size_in_bytes": 80145282,
              "description": "Field Data"
            },
            "fnm": {
              "size_in_bytes": 148210,
              "description": "Fields"
            },
            "doc": {
              "size_in_bytes": 18783854,
              "description": "Frequencies"
            },
            "tim": {
              "size_in_bytes": 20320021,
              "description": "Term Dictionary"
            },
            "pos": {
              "size_in_bytes": 24245847,
              "description": "Positions"
            },
            "dvm": {
              "size_in_bytes": 135696,
              "description": "DocValues"
            },
            "fdx": {
              "size_in_bytes": 53731,
              "description": "Field Index"
            },
            "dvd": {
              "size_in_bytes": 15266820,
              "description": "DocValues"
            },
            "nvd": {
              "size_in_bytes": 986090,
              "description": "Norms"
            },
            "dim": {
              "size_in_bytes": 13237382,
              "description": "Points"
            },
            "si": {
              "size_in_bytes": 5470,
              "description": "Segment Info"
            }
          }
        },
        "translog": {
          "operations": 0,
          "size_in_bytes": 550,
          "uncommitted_operations": 0,
          "uncommitted_size_in_bytes": 550,
          "earliest_last_modified_age": 0
        },
        "request_cache": {
          "memory_size_in_bytes": 0,
          "evictions": 0,
          "hit_count": 0,
          "miss_count": 0
        },
        "recovery": {
          "current_as_source": 0,
          "current_as_target": 0,
          "throttle_time_in_millis": 0
        }
      }
    }
  }
}

Thanks a lot for your help David. The issue was actually in a specific field for which I had accidentally turned on for ngram indexing. This field was a unique ID field and hence the bloat in the index size. Once I removed that, there was a reduction in storage.
Looking at the stats, how does one interpret the gains made by compression vs the gains made by skipping indexing?

Have a look at this blog post for an example of how to evaluate how storage is affected by different settings and mappings.