Hi,
I want to build an index with more than 4M vectors of dimension 768. My setup for now is a DigitalOcean droplet with 4GB and 2vCPU (planning to increase as needed).
I first started to add the first million which seems to be ok. But now, I added and additional 500k vectors, and although my code ran ok and I count 1.5M vectors in my index, I now see a permanent process running in the background consuming a lot of cpu. It is now running for more than 12 hours. This is what I'm getting when printing es.nodes.hot_threads()
.
>>> print(es.nodes.hot_threads())
::: {julia-es}{S4521jOZTJaZR9KNKsMcMw}{LZe3maChRY-nfLeEkMb8ow}{julia-es}{127.0.0.1}{127.0.0.1:9300}{cdfhilmrstw}{8.8.1}{ml.allocated_processors_double=2.0, xpack.installed=true, ml.machine_memory=4101406720, ml.allocated_processors=2, ml.max_jvm_size=2051014656}
Hot threads at 2023-07-10T07:50:34.163Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:
100.0% [cpu=28.0%, other=72.0%] (500ms out of 500ms) cpu usage by thread 'elasticsearch[julia-es][[julia_dgsi][0]: Lucene Merge Thread #58]'
7/10 snapshots sharing following 22 elements
app/org.apache.lucene.core@9.6.0/org.apache.lucene.codecs.lucene95.OffHeapFloatVectorValues.vectorValue(OffHeapFloatVectorValues.java:61)
app/org.apache.lucene.core@9.6.0/org.apache.lucene.codecs.lucene95.OffHeapFloatVectorValues$DenseOffHeapVectorValues.vectorValue(OffHeapFloatVectorValues.java:86)
app/org.apache.lucene.core@9.6.0/org.apache.lucene.util.hnsw.HnswGraphSearcher.compare(HnswGraphSearcher.java:290)
app/org.apache.lucene.core@9.6.0/org.apache.lucene.util.hnsw.HnswGraphSearcher.searchLevel(HnswGraphSearcher.java:267)
app/org.apache.lucene.core@9.6.0/org.apache.lucene.util.hnsw.HnswGraphSearcher.searchLevel(HnswGraphSearcher.java:208)
app/org.apache.lucene.core@9.6.0/org.apache.lucene.util.hnsw.HnswGraphBuilder.addGraphNode(HnswGraphBuilder.java:278)
app/org.apache.lucene.core@9.6.0/org.apache.lucene.util.hnsw.HnswGraphBuilder.addGraphNode(HnswGraphBuilder.java:286)
app/org.apache.lucene.core@9.6.0/org.apache.lucene.util.hnsw.HnswGraphBuilder.addVectors(HnswGraphBuilder.java:235)
app/org.apache.lucene.core@9.6.0/org.apache.lucene.util.hnsw.HnswGraphBuilder.build(HnswGraphBuilder.java:162)
app/org.apache.lucene.core@9.6.0/org.apache.lucene.codecs.lucene95.Lucene95HnswVectorsWriter.mergeOneField(Lucene95HnswVectorsWriter.java:477)
app/org.apache.lucene.core@9.6.0/org.apache.lucene.codecs.perfield.PerFieldKnnVectorsFormat$FieldsWriter.mergeOneField(PerFieldKnnVectorsFormat.java:117)
app/org.apache.lucene.core@9.6.0/org.apache.lucene.codecs.KnnVectorsWriter.merge(KnnVectorsWriter.java:98)
app/org.apache.lucene.core@9.6.0/org.apache.lucene.index.SegmentMerger.mergeVectorValues(SegmentMerger.java:255)
app/org.apache.lucene.core@9.6.0/org.apache.lucene.index.SegmentMerger$$Lambda$7848/0x00000008022d7cc0.merge(Unknown Source)
app/org.apache.lucene.core@9.6.0/org.apache.lucene.index.SegmentMerger.mergeWithLogging(SegmentMerger.java:298)
app/org.apache.lucene.core@9.6.0/org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:149)
app/org.apache.lucene.core@9.6.0/org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:5140)
app/org.apache.lucene.core@9.6.0/org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4680)
app/org.apache.lucene.core@9.6.0/org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:6432)
app/org.apache.lucene.core@9.6.0/org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:639)
app/org.elasticsearch.server@8.8.1/org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:118)
app/org.apache.lucene.core@9.6.0/org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:700)
3/10 snapshots sharing following 22 elements
app/org.apache.lucene.core@9.6.0/org.apache.lucene.codecs.lucene95.OffHeapFloatVectorValues.vectorValue(OffHeapFloatVectorValues.java:61)
app/org.apache.lucene.core@9.6.0/org.apache.lucene.codecs.lucene95.OffHeapFloatVectorValues$DenseOffHeapVectorValues.vectorValue(OffHeapFloatVectorValues.java:86)
app/org.apache.lucene.core@9.6.0/org.apache.lucene.util.hnsw.HnswGraphSearcher.compare(HnswGraphSearcher.java:290)
app/org.apache.lucene.core@9.6.0/org.apache.lucene.util.hnsw.HnswGraphSearcher.searchLevel(HnswGraphSearcher.java:267)
app/org.apache.lucene.core@9.6.0/org.apache.lucene.util.hnsw.HnswGraphSearcher.searchLevel(HnswGraphSearcher.java:208)
app/org.apache.lucene.core@9.6.0/org.apache.lucene.util.hnsw.HnswGraphBuilder.addGraphNode(HnswGraphBuilder.java:273)
app/org.apache.lucene.core@9.6.0/org.apache.lucene.util.hnsw.HnswGraphBuilder.addGraphNode(HnswGraphBuilder.java:286)
app/org.apache.lucene.core@9.6.0/org.apache.lucene.util.hnsw.HnswGraphBuilder.addVectors(HnswGraphBuilder.java:235)
app/org.apache.lucene.core@9.6.0/org.apache.lucene.util.hnsw.HnswGraphBuilder.build(HnswGraphBuilder.java:162)
app/org.apache.lucene.core@9.6.0/org.apache.lucene.codecs.lucene95.Lucene95HnswVectorsWriter.mergeOneField(Lucene95HnswVectorsWriter.java:477)
app/org.apache.lucene.core@9.6.0/org.apache.lucene.codecs.perfield.PerFieldKnnVectorsFormat$FieldsWriter.mergeOneField(PerFieldKnnVectorsFormat.java:117)
app/org.apache.lucene.core@9.6.0/org.apache.lucene.codecs.KnnVectorsWriter.merge(KnnVectorsWriter.java:98)
app/org.apache.lucene.core@9.6.0/org.apache.lucene.index.SegmentMerger.mergeVectorValues(SegmentMerger.java:255)
app/org.apache.lucene.core@9.6.0/org.apache.lucene.index.SegmentMerger$$Lambda$7848/0x00000008022d7cc0.merge(Unknown Source)
app/org.apache.lucene.core@9.6.0/org.apache.lucene.index.SegmentMerger.mergeWithLogging(SegmentMerger.java:298)
app/org.apache.lucene.core@9.6.0/org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:149)
app/org.apache.lucene.core@9.6.0/org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:5140)
app/org.apache.lucene.core@9.6.0/org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4680)
app/org.apache.lucene.core@9.6.0/org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:6432)
app/org.apache.lucene.core@9.6.0/org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:639)
app/org.elasticsearch.server@8.8.1/org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:118)
app/org.apache.lucene.core@9.6.0/org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:700)