TL;DR
After adding mappings for 217 fields (72 of which were nested), heap increased ~50%, and much of it went to the fixed_bit_set memory. Is there a way to reduce this heap increase while maintaining the mapping update?
Update:
We have taken a number of measures to support our feature and mitigate the memory increase:
- Reindexed all of our indexes less than 100gb in size (still working on the large ones) to prior mappings, and set better shard counts (size in gb / 30). This reduced our shard count from 4400 to 1500. Heap was drastically reduced, and this got our cluster stable.
- Upgraded minor version to 5.6 - this didn't seem to have a significant impact.
- Added 8 additional nodes to increase memory of the cluster
- Reduce additional mappings needed for the feature to 78 additional fields (instead of 217), 25 of which are nested (instead of 72)
When we rolled out these new, reduced mappings for the feature to all indexes less than 100gb, the fixed bit set memory only increased ~100mb per node, and there was no noticeable increase in heap in the cluster. We were expecting to see a larger increase in memory and fixed bit set because the original problem mappings caused a much larger increase.
Additional details:
We have 17 data nodes, each with:
- 64 GB Ram, 31.9 GB heap
- 1.6 TB SSD, ~800gb - 1tb data each
- 6 core 3.5 ghz cpu
Cluster stats (at the time of the incident):
- ES version 5.4.3
- ~550 indices, all with the same mappings and settings
- Indexes range in size from <1 MB to 1 TB, we try to keep shards under 50gb
- 4400 shards
- ~16.5 TB data
- ~55 billion docs
Mapping information:
- Prior to the migration, we had 569 total fields, 83 of which were nested - the majority of fields are under a single layer of nesting
- The migration increased the total number of fields to 786, of which 155 total were nested
After the heap spike, we inspected node stats and found the majority of memory appears to be taken by the fixed bit set. This ranges from ~18gb to 21gb+ on our nodes. Its hard to find information on it, but it appears to be related to nested documents. We have a suspicion that the increase in nested docs is mostly responsible for the heap spike.
We would love to support this mapping update if feasible. Are there settings or configurations that we can tweak to reduce the heap associated with these changes? Are there other things we can do to investigate and fix the issue?
I'm happy to provide additional info about the cluster and current settings.