We are using 4Gb memory Storage optimized (dense) instance in elastic cloud for storing millions of token with embeddings.
We do an upsert operation to index embeddings.
We follow two set process
1- Index tokens in Elasticsearch without embeddings.
2- Update the indexed tokens with embeddings.
Update operation is continous as long as there are tokens without embeddings
We are observing that after 1hr of updates, heap memory of the nodes breaches its limit of 1.9Gb, causing the node to go down.
At this time, old GC is triggerred.
I am happy to share metrics when this occurs (monitoring is enabled on cluster)
Can someone please help on the RCA of this issue.
I see this thread when heap crosses limits
100.0% [cpu=78.8%, other=21.2%] (500ms out of 500ms) cpu usage by thread 'elasticsearch[instance-0000000018][[semantic-data-index][0]: Lucene Merge Thread #0]'
2/10 snapshots sharing following 25 elements
app/org.apache.lucene.core@9.10.0/org.apache.lucene.internal.vectorization.PanamaVectorUtilSupport.squareDistanceBody256(PanamaVectorUtilSupport.java:540)
app/org.apache.lucene.core@9.10.0/org.apache.lucene.internal.vectorization.PanamaVectorUtilSupport.squareDistance(PanamaVectorUtilSupport.java:522)