Normally, we have zero old gen collections. The exception is bulk upserts of our entire index, refreshing product data from our DB to Elasticsearch. While this is not necessarily a problem now, it gives me "pause." Is this due to use the default refresh interval and having fielddata constantly refresh? We have about 150mb of field data (5GB JVM). While ingesting upserts, these nodes are also receiving search traffic. Even if we moved our analyzed strings over to doc_values, I assume that the global ordinals would still be in memory. Due to the nature of fielddata being immutable, I assume that after every refresh, the field data would be invalidated and dereferenced in memory. Or, is it only after a segment merge?
Is there anything else regarding indexing that would cause old gen to rapidly fill with dereferenced objects such that old gen collection is both filling up and collected successfully (not a leak)? Is there anything about segment merging that might use memory?
My current thought was that, during the bulk reindex period, we should change the index settings to extend the refresh rate. This should eliminate field data from having to reload (assuming it is the culprit). After bulk indexing, set it back to the default for one off document updates as recommended for index performance.