Indexing is getting slower and slower as it progresses

I am trying to index 114mil rows of data with 150mil attachments from a PeopleSoft application to ES 6.1 (due to compatibility with PeopleSoft, can't upgrade to higher versions).

Elastic Configuration:

A single node (Coordinator, ingest and data node) cluster with
OS: Oracle Linux 5.4.17-2136.336.5.1.el7uek.x86_64 x86_64
OCPU: 32
Memory: 128GB
Disk: 10TB

elasticsearch.yml options:
http.max_content_length: 300mb
indices.memory.index_buffer_size: 20%
bootstrap.memory_lock: true

jvm.options
-Xms31g
-Xmx31g
-XX:ParallelGCThreads=20
-XX:NewRatio=2

"refresh_interval" : "-1"

Number of shards: 200
Replicas: 0
Attachment Handlers: 10
Max Sub Queue Size: 20
Full Direct Transfer: enabled
Index segment size: 10mb
Bulk transfer enabled

The first 40mil rows indexed at the rate of 40000/min and noticed a gradual slowdown of throughput to 20000/min by the time it indexed 50mil rows.

And then noticed a further slowdown ~4500/min by the time it indexed 55mil rows.

The hot thread output shows the following

  Hot threads at 2024-12-11T02:21:46.612Z, interval=500ms, busiestThreads=99999, ignoreIdleThreads=true:

   100.2% (500.7ms out of 500ms) cpu usage by thread 'elasticsearch[RLskF8A][bulk][T#20]'
     7/10 snapshots sharing following 24 elements
       org.apache.tika.parser.pdf.PDF2XHTML.processPage(PDF2XHTML.java:147)
       org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:319)
       org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:266)
       org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:117)
       org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:168)
       org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
       org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
       org.apache.tika.Tika.parseToString(Tika.java:568)
       org.elasticsearch.ingest.attachment.TikaImpl.lambda$0(TikaImpl.java:110)
       org.elasticsearch.ingest.attachment.TikaImpl$$Lambda$1926/149886079.run(Unknown Source)
       java.security.AccessController.doPrivileged(Native Method)
       org.elasticsearch.ingest.attachment.TikaImpl.parse(TikaImpl.java:109)
       org.elasticsearch.ingest.attachment.AttachmentProcessor.execute(AttachmentProcessor.java:88)
       org.elasticsearch.ingest.CompoundProcessor.execute(CompoundProcessor.java:100)
       org.elasticsearch.ingest.CompoundProcessor.execute(CompoundProcessor.java:100)
       org.elasticsearch.ingest.Pipeline.execute(Pipeline.java:58)
       org.elasticsearch.ingest.PipelineExecutionService.innerExecute(PipelineExecutionService.java:169)
       org.elasticsearch.ingest.PipelineExecutionService.access$000(PipelineExecutionService.java:42)
       org.elasticsearch.ingest.PipelineExecutionService$2.doRun(PipelineExecutionService.java:94)
       org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:637)
       org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
       java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
       java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
       java.lang.Thread.run(Thread.java:748)
     2/10 snapshots sharing following 23 elements
       org.apache.tika.parser.microsoft.WordExtractor.handleParagraph(WordExtractor.java:286)
       org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:189)
       org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:176)
       org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:132)
       org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
       org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
       org.apache.tika.Tika.parseToString(Tika.java:568)
       org.elasticsearch.ingest.attachment.TikaImpl.lambda$0(TikaImpl.java:110)
       org.elasticsearch.ingest.attachment.TikaImpl$$Lambda$1926/149886079.run(Unknown Source)
       java.security.AccessController.doPrivileged(Native Method)
       org.elasticsearch.ingest.attachment.TikaImpl.parse(TikaImpl.java:109)
       org.elasticsearch.ingest.attachment.AttachmentProcessor.execute(AttachmentProcessor.java:88)
       org.elasticsearch.ingest.CompoundProcessor.execute(CompoundProcessor.java:100)
       org.elasticsearch.ingest.CompoundProcessor.execute(CompoundProcessor.java:100)
       org.elasticsearch.ingest.Pipeline.execute(Pipeline.java:58)
       org.elasticsearch.ingest.PipelineExecutionService.innerExecute(PipelineExecutionService.java:169)
       org.elasticsearch.ingest.PipelineExecutionService.access$000(PipelineExecutionService.java:42)
       org.elasticsearch.ingest.PipelineExecutionService$2.doRun(PipelineExecutionService.java:94)
       org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:637)
       org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
       java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
       java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
       java.lang.Thread.run(Thread.java:748)
     unique snapshot
       java.util.zip.Inflater.inflateBytes(Native Method)
       java.util.zip.Inflater.inflate(Inflater.java:259)
       java.util.zip.InflaterInputStream.read(InflaterInputStream.java:152)
       org.apache.poi.openxml4j.util.ZipSecureFile$ThresholdInputStream.read(ZipSecureFile.java:220)
       com.sun.org.apache.xerces.internal.impl.XMLEntityManager$RewindableInputStream.read(XMLEntityManager.java:2942)
       com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(UTF8Reader.java:303)
       com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(XMLEntityScanner.java:1895)
       com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.skipChar(XMLEntityScanner.java:1551)
       com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2823)
       com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:605)
       com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:113)
       com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:507)
       com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:867)
       com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:796)
       com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:142)
       com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:247)
       com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:339)
       javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121)
       org.apache.poi.util.DocumentHelper.readDocument(DocumentHelper.java:140)
       org.apache.poi.POIXMLTypeLoader.parse(POIXMLTypeLoader.java:163)
       org.openxmlformats.schemas.spreadsheetml.x2006.main.CommentsDocument$Factory.parse(Unknown Source)
       org.apache.poi.xssf.model.CommentsTable.readFrom(CommentsTable.java:72)
       org.apache.poi.xssf.model.CommentsTable.<init>(CommentsTable.java:67)
       org.apache.poi.xssf.eventusermodel.XSSFReader$SheetIterator.getSheetComments(XSSFReader.java:343)
       org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.buildXHTML(XSSFExcelExtractorDecorator.java:157)
       org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:135)
       org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.getXHTML(XSSFExcelExtractorDecorator.java:120)
       org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:143)
       org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:106)
       org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:188)
       org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
       org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
       org.apache.tika.Tika.parseToString(Tika.java:568)
       org.elasticsearch.ingest.attachment.TikaImpl.lambda$0(TikaImpl.java:110)
       org.elasticsearch.ingest.attachment.TikaImpl$$Lambda$1926/149886079.run(Unknown Source)
       java.security.AccessController.doPrivileged(Native Method)
       org.elasticsearch.ingest.attachment.TikaImpl.parse(TikaImpl.java:109)
       org.elasticsearch.ingest.attachment.AttachmentProcessor.execute(AttachmentProcessor.java:88)
       org.elasticsearch.ingest.CompoundProcessor.execute(CompoundProcessor.java:100)
       org.elasticsearch.ingest.CompoundProcessor.execute(CompoundProcessor.java:100)
       org.elasticsearch.ingest.Pipeline.execute(Pipeline.java:58)
       org.elasticsearch.ingest.PipelineExecutionService.innerExecute(PipelineExecutionService.java:169)
       org.elasticsearch.ingest.PipelineExecutionService.access$000(PipelineExecutionService.java:42)
       org.elasticsearch.ingest.PipelineExecutionService$2.doRun(PipelineExecutionService.java:94)
       org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:637)
       org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
       java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
       java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
       java.lang.Thread.run(Thread.java:748)

   98.8% (494ms out of 500ms) cpu usage by thread 'elasticsearch[RLskF8A][management][T#3]'
     2/10 snapshots sharing following 24 elements
       java.nio.file.Files.newDirectoryStream(Files.java:457)
       org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:215)
       org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:234)
       org.apache.lucene.store.FilterDirectory.listAll(FilterDirectory.java:57)
       org.elasticsearch.index.store.Store$StoreStatsCache.estimateSize(Store.java:1418)
       org.elasticsearch.index.store.Store$StoreStatsCache.refresh(Store.java:1410)
       org.elasticsearch.index.store.Store$StoreStatsCache.refresh(Store.java:1399)
       org.elasticsearch.common.util.SingleObjectCache.getOrRefresh(SingleObjectCache.java:54)
       org.elasticsearch.index.store.Store.stats(Store.java:349)
       org.elasticsearch.index.shard.IndexShard.storeStats(IndexShard.java:947)
       org.elasticsearch.action.admin.indices.stats.CommonStats.<init>(CommonStats.java:178)
       org.elasticsearch.action.admin.indices.stats.TransportIndicesStatsAction.shardOperation(TransportIndicesStatsAction.java:164)
       org.elasticsearch.action.admin.indices.stats.TransportIndicesStatsAction.shardOperation(TransportIndicesStatsAction.java:45)
       org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$BroadcastByNodeTransportRequestHandler.onShardOperation(TransportBroadcastByNodeAction.java:433)
       org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$BroadcastByNodeTransportRequestHandler.messageReceived(TransportBroadcastByNodeAction.java:412)
       org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$BroadcastByNodeTransportRequestHandler.messageReceived(TransportBroadcastByNodeAction.java:399)
       org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:30)
       org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66)
       org.elasticsearch.transport.TransportService$7.doRun(TransportService.java:652)
       org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:637)
       org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
       java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
       java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
       java.lang.Thread.run(Thread.java:748)
     7/10 snapshots sharing following 37 elements
       java.io.UnixFileSystem.canonicalize0(Native Method)
       java.io.UnixFileSystem.canonicalize(UnixFileSystem.java:172)
       java.io.File.getCanonicalPath(File.java:618)
       java.io.FilePermission$1.run(FilePermission.java:224)
       java.io.FilePermission$1.run(FilePermission.java:212)
       java.security.AccessController.doPrivileged(Native Method)
       java.io.FilePermission.init(FilePermission.java:212)
       java.io.FilePermission.<init>(FilePermission.java:299)
       java.lang.SecurityManager.checkRead(SecurityManager.java:888)
       sun.nio.fs.UnixPath.checkRead(UnixPath.java:795)
       sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:49)
       sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:144)
       sun.nio.fs.LinuxFileSystemProvider.readAttributes(LinuxFileSystemProvider.java:99)
       java.nio.file.Files.readAttributes(Files.java:1737)
       java.nio.file.Files.size(Files.java:2332)
       org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:243)
       org.apache.lucene.store.FilterDirectory.fileLength(FilterDirectory.java:67)
       org.elasticsearch.index.store.Store$StoreStatsCache.estimateSize(Store.java:1421)
       org.elasticsearch.index.store.Store$StoreStatsCache.refresh(Store.java:1410)
       org.elasticsearch.index.store.Store$StoreStatsCache.refresh(Store.java:1399)
       org.elasticsearch.common.util.SingleObjectCache.getOrRefresh(SingleObjectCache.java:54)
       org.elasticsearch.index.store.Store.stats(Store.java:349)
       org.elasticsearch.index.shard.IndexShard.storeStats(IndexShard.java:947)
       org.elasticsearch.action.admin.indices.stats.CommonStats.<init>(CommonStats.java:178)
       org.elasticsearch.action.admin.indices.stats.TransportIndicesStatsAction.shardOperation(TransportIndicesStatsAction.java:164)
       org.elasticsearch.action.admin.indices.stats.TransportIndicesStatsAction.shardOperation(TransportIndicesStatsAction.java:45)
       org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$BroadcastByNodeTransportRequestHandler.onShardOperation(TransportBroadcastByNodeAction.java:433)
       org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$BroadcastByNodeTransportRequestHandler.messageReceived(TransportBroadcastByNodeAction.java:412)
       org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$BroadcastByNodeTransportRequestHandler.messageReceived(TransportBroadcastByNodeAction.java:399)
       org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:30)
       org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66)
       org.elasticsearch.transport.TransportService$7.doRun(TransportService.java:652)
       org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:637)
       org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
       java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
       java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
       java.lang.Thread.run(Thread.java:748)
     unique snapshot
       sun.nio.fs.UnixNativeDispatcher.stat0(Native Method)
       sun.nio.fs.UnixNativeDispatcher.stat(UnixNativeDispatcher.java:286)
       sun.nio.fs.UnixFileAttributes.get(UnixFileAttributes.java:70)
       sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:52)
       sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:144)
       sun.nio.fs.LinuxFileSystemProvider.readAttributes(LinuxFileSystemProvider.java:99)
       java.nio.file.Files.readAttributes(Files.java:1737)
       java.nio.file.Files.size(Files.java:2332)
       org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:243)
       org.apache.lucene.store.FilterDirectory.fileLength(FilterDirectory.java:67)
       org.elasticsearch.index.store.Store$StoreStatsCache.estimateSize(Store.java:1421)
       org.elasticsearch.index.store.Store$StoreStatsCache.refresh(Store.java:1410)
       org.elasticsearch.index.store.Store$StoreStatsCache.refresh(Store.java:1399)
       org.elasticsearch.common.util.SingleObjectCache.getOrRefresh(SingleObjectCache.java:54)
       org.elasticsearch.index.store.Store.stats(Store.java:349)
       org.elasticsearch.index.shard.IndexShard.storeStats(IndexShard.java:947)
       org.elasticsearch.action.admin.indices.stats.CommonStats.<init>(CommonStats.java:178)
       org.elasticsearch.action.admin.indices.stats.TransportIndicesStatsAction.shardOperation(TransportIndicesStatsAction.java:164)
       org.elasticsearch.action.admin.indices.stats.TransportIndicesStatsAction.shardOperation(TransportIndicesStatsAction.java:45)
       org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$BroadcastByNodeTransportRequestHandler.onShardOperation(TransportBroadcastByNodeAction.java:433)
       org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$BroadcastByNodeTransportRequestHandler.messageReceived(TransportBroadcastByNodeAction.java:412)
       org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$BroadcastByNodeTransportRequestHandler.messageReceived(TransportBroadcastByNodeAction.java:399)
       org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:30)
       org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66)
       org.elasticsearch.transport.TransportService$7.doRun(TransportService.java:652)
       org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:637)
       org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
       java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
       java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
       java.lang.Thread.run(Thread.java:748)

   92.5% (462.3ms out of 500ms) cpu usage by thread 'elasticsearch[RLskF8A][refresh][T#5]'
     7/10 snapshots sharing following 28 elements
       org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.pushTerm(BlockTreeTermsWriter.java:905)
       org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.write(BlockTreeTermsWriter.java:869)
       org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter.write(BlockTreeTermsWriter.java:343)
       org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.write(PerFieldPostingsFormat.java:140)
       org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:108)
       org.apache.lucene.index.DefaultIndexingChain.flush(DefaultIndexingChain.java:162)
       org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:451)
       org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:542)
       org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:658)
       org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:453)
       org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:293)
       org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:268)
       org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:258)
       org.apache.lucene.index.FilterDirectoryReader.doOpenIfChanged(FilterDirectoryReader.java:104)
       org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:140)
       org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:156)
       org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:58)
       org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:176)
       org.apache.lucene.search.ReferenceManager.maybeRefreshBlocking(ReferenceManager.java:253)
       org.elasticsearch.index.engine.InternalEngine.refresh(InternalEngine.java:1336)
       org.elasticsearch.index.engine.InternalEngine.writeIndexingBuffer(InternalEngine.java:1366)
       org.elasticsearch.index.shard.IndexShard.writeIndexingBuffer(IndexShard.java:1702)
       org.elasticsearch.indices.IndexingMemoryController$1.doRun(IndexingMemoryController.java:177)
       org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:637)
       org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
       java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
       java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
       java.lang.Thread.run(Thread.java:748)
     2/10 snapshots sharing following 30 elements
       org.apache.lucene.store.DataOutput.writeVInt(DataOutput.java:191)
       org.apache.lucene.codecs.lucene50.Lucene50PostingsWriter.finishTerm(Lucene50PostingsWriter.java:392)
       org.apache.lucene.codecs.PushPostingsWriterBase.writeTerm(PushPostingsWriterBase.java:169)
       org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.write(BlockTreeTermsWriter.java:864)
       org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter.write(BlockTreeTermsWriter.java:343)
       org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.write(PerFieldPostingsFormat.java:140)
       org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:108)
       org.apache.lucene.index.DefaultIndexingChain.flush(DefaultIndexingChain.java:162)
       org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:451)
       org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:542)
       org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:658)
       org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:453)
       org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:293)
       org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:268)
       org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:258)
       org.apache.lucene.index.FilterDirectoryReader.doOpenIfChanged(FilterDirectoryReader.java:104)
       org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:140)
       org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:156)
       org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:58)
       org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:176)
       org.apache.lucene.search.ReferenceManager.maybeRefreshBlocking(ReferenceManager.java:253)
       org.elasticsearch.index.engine.InternalEngine.refresh(InternalEngine.java:1336)
       org.elasticsearch.index.engine.InternalEngine.writeIndexingBuffer(InternalEngine.java:1366)
       org.elasticsearch.index.shard.IndexShard.writeIndexingBuffer(IndexShard.java:1702)
       org.elasticsearch.indices.IndexingMemoryController$1.doRun(IndexingMemoryController.java:177)
       org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:637)
       org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
       java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
       java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
       java.lang.Thread.run(Thread.java:748)
     unique snapshot
       org.apache.lucene.codecs.PushPostingsWriterBase.writeTerm(PushPostingsWriterBase.java:122)
       org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.write(BlockTreeTermsWriter.java:864)
       org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter.write(BlockTreeTermsWriter.java:343)
       org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.write(PerFieldPostingsFormat.java:140)
       org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:108)
       org.apache.lucene.index.DefaultIndexingChain.flush(DefaultIndexingChain.java:162)
       org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:451)
       org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:542)
       org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:658)
       org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:453)
       org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:293)
       org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:268)
       org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:258)
       org.apache.lucene.index.FilterDirectoryReader.doOpenIfChanged(FilterDirectoryReader.java:104)
       org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:140)
       org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:156)
       org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:58)
       org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:176)
       org.apache.lucene.search.ReferenceManager.maybeRefreshBlocking(ReferenceManager.java:253)
       org.elasticsearch.index.engine.InternalEngine.refresh(InternalEngine.java:1336)
       org.elasticsearch.index.engine.InternalEngine.writeIndexingBuffer(InternalEngine.java:1366)
       org.elasticsearch.index.shard.IndexShard.writeIndexingBuffer(IndexShard.java:1702)
       org.elasticsearch.indices.IndexingMemoryController$1.doRun(IndexingMemoryController.java:177)
       org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:637)
       org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
       java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
       java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
       java.lang.Thread.run(Thread.java:748)

   89.0% (444.8ms out of 500ms) cpu usage by thread 'elasticsearch[RLskF8A][refresh][T#4]'
     3/10 snapshots sharing following 29 elements
       org.apache.lucene.index.DefaultIndexingChain.flush(DefaultIndexingChain.java:162)
       org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:451)
       org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:542)
       org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:658)
       org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:453)
       org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:293)
       org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:268)
       org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:258)
       org.apache.lucene.index.FilterDirectoryReader.doOpenIfChanged(FilterDirectoryReader.java:104)
       org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:140)
       org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:156)
       org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:58)
       org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:176)
       org.apache.lucene.search.ReferenceManager.maybeRefreshBlocking(ReferenceManager.java:253)
       org.elasticsearch.index.engine.InternalEngine$ExternalSearcherManager.refreshIfNeeded(InternalEngine.java:292)
       org.elasticsearch.index.engine.InternalEngine$ExternalSearcherManager.refreshIfNeeded(InternalEngine.java:267)
       org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:176)
       org.apache.lucene.search.ReferenceManager.maybeRefreshBlocking(ReferenceManager.java:253)
       org.elasticsearch.index.engine.InternalEngine.refresh(InternalEngine.java:1332)
       org.elasticsearch.index.engine.InternalEngine.refresh(InternalEngine.java:1314)
       org.elasticsearch.index.shard.IndexShard.refresh(IndexShard.java:855)
       org.elasticsearch.index.IndexService.maybeRefreshEngine(IndexService.java:695)
       org.elasticsearch.index.IndexService.access$400(IndexService.java:97)
       org.elasticsearch.index.IndexService$AsyncRefreshTask.runInternal(IndexService.java:899)
       org.elasticsearch.index.IndexService$BaseAsyncTask.run(IndexService.java:809)
       org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:568)
       java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
       java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
       java.lang.Thread.run(Thread.java:748)

The cluster stats shows the following.

{
  "_shards" : {
    "total" : 200,
    "successful" : 200,
    "failed" : 0
  },
  "_all" : {
    "primaries" : {
      "docs" : {
        "count" : 55543556,
        "deleted" : 47781
      },
      "store" : {
        "size_in_bytes" : 2672291755255
      },
      "indexing" : {
        "index_total" : 4120398,
        "index_time_in_millis" : 50952877,
        "index_current" : 4,
        "index_failed" : 0,
        "delete_total" : 0,
        "delete_time_in_millis" : 0,
        "delete_current" : 0,
        "noop_update_total" : 0,
        "is_throttled" : false,
        "throttle_time_in_millis" : 0
      },
      "get" : {
        "total" : 1,
        "time_in_millis" : 1,
        "exists_total" : 0,
        "exists_time_in_millis" : 0,
        "missing_total" : 1,
        "missing_time_in_millis" : 1,
        "current" : 0
      },
      "search" : {
        "open_contexts" : 0,
        "query_total" : 1600,
        "query_time_in_millis" : 288,
        "query_current" : 0,
        "fetch_total" : 0,
        "fetch_time_in_millis" : 0,
        "fetch_current" : 0,
        "scroll_total" : 0,
        "scroll_time_in_millis" : 0,
        "scroll_current" : 0,
        "suggest_total" : 0,
        "suggest_time_in_millis" : 0,
        "suggest_current" : 0
      },
      "merges" : {
        "current" : 22,
        "current_docs" : 1629270,
        "current_size_in_bytes" : 92451402153,
        "total" : 3515,
        "total_time_in_millis" : 789186847,
        "total_docs" : 18493872,
        "total_size_in_bytes" : 1133936081916,
        "total_stopped_time_in_millis" : 0,
        "total_throttled_time_in_millis" : 141446873,
        "total_auto_throttle_in_bytes" : 1505047452
      },
      "refresh" : {
        "total" : 29742, 30413, 30565 (immediately after change) 2:10pm, 30613
        "total_time_in_millis" : 60025231,
        "listeners" : 0
      },
      "flush" : {
        "total" : 600,
        "total_time_in_millis" : 1905576
      },
      "warmer" : {
        "current" : 0,
        "total" : 26348,
        "total_time_in_millis" : 2047
      },
      "query_cache" : {
        "memory_size_in_bytes" : 0,
        "total_count" : 0,
        "hit_count" : 0,
        "miss_count" : 0,
        "cache_size" : 0,
        "cache_count" : 0,
        "evictions" : 0
      },
      "fielddata" : {
        "memory_size_in_bytes" : 0,
        "evictions" : 0
      },
      "completion" : {
        "size_in_bytes" : 0
      },
      "segments" : {
        "count" : 5537,
        "memory_in_bytes" : 12468548298,
        "terms_memory_in_bytes" : 12384593318,
        "stored_fields_memory_in_bytes" : 47619256,
        "term_vectors_memory_in_bytes" : 0,
        "norms_memory_in_bytes" : 21696640,
        "points_memory_in_bytes" : 3065192,
        "doc_values_memory_in_bytes" : 11573892,
        "index_writer_memory_in_bytes" : 6503754102,
        "version_map_memory_in_bytes" : 12366103,
        "fixed_bit_set_memory_in_bytes" : 0,
        "max_unsafe_auto_id_timestamp" : -1,
        "file_sizes" : { }
      },
      "translog" : {
        "operations" : 2133296,
        "size_in_bytes" : 114717685968,
        "uncommitted_operations" : 1159571,
        "uncommitted_size_in_bytes" : 62252691813
      },
      "request_cache" : {
        "memory_size_in_bytes" : 0,
        "evictions" : 0,
        "hit_count" : 386,
        "miss_count" : 1214
      },
      "recovery" : {
        "current_as_source" : 0,
        "current_as_target" : 0,
        "throttle_time_in_millis" : 0
      }
    },
    "total" : {
      "docs" : {
        "count" : 55543556,
        "deleted" : 47781
      },
      "store" : {
        "size_in_bytes" : 2672291755255
      },
      "indexing" : {
        "index_total" : 4120398,
        "index_time_in_millis" : 50952877,
        "index_current" : 4,
        "index_failed" : 0,
        "delete_total" : 0,
        "delete_time_in_millis" : 0,
        "delete_current" : 0,
        "noop_update_total" : 0,
        "is_throttled" : false,
        "throttle_time_in_millis" : 0
      },
      "get" : {
        "total" : 1,
        "time_in_millis" : 1,
        "exists_total" : 0,
        "exists_time_in_millis" : 0,
        "missing_total" : 1,
        "missing_time_in_millis" : 1,
        "current" : 0
      },
      "search" : {
        "open_contexts" : 0,
        "query_total" : 1600,
        "query_time_in_millis" : 288,
        "query_current" : 0,
        "fetch_total" : 0,
        "fetch_time_in_millis" : 0,
        "fetch_current" : 0,
        "scroll_total" : 0,
        "scroll_time_in_millis" : 0,
        "scroll_current" : 0,
        "suggest_total" : 0,
        "suggest_time_in_millis" : 0,
        "suggest_current" : 0
      },
      "merges" : {
        "current" : 22,
        "current_docs" : 1629270,
        "current_size_in_bytes" : 92451402153,
        "total" : 3515,
        "total_time_in_millis" : 789186847,
        "total_docs" : 18493872,
        "total_size_in_bytes" : 1133936081916,
        "total_stopped_time_in_millis" : 0,
        "total_throttled_time_in_millis" : 141446873,
        "total_auto_throttle_in_bytes" : 1505047452
      },
      "refresh" : {
        "total" : 29742,
        "total_time_in_millis" : 60025231,
        "listeners" : 0
      },
      "flush" : {
        "total" : 600,
        "total_time_in_millis" : 1905576
      },
      "warmer" : {
        "current" : 0,
        "total" : 26348,
        "total_time_in_millis" : 2047
      },
      "query_cache" : {
        "memory_size_in_bytes" : 0,
        "total_count" : 0,
        "hit_count" : 0,
        "miss_count" : 0,
        "cache_size" : 0,
        "cache_count" : 0,
        "evictions" : 0
      },
      "fielddata" : {
        "memory_size_in_bytes" : 0,
        "evictions" : 0
      },
      "completion" : {
        "size_in_bytes" : 0
      },
      "segments" : {
        "count" : 5537,
        "memory_in_bytes" : 12468548298,
        "terms_memory_in_bytes" : 12384593318,
        "stored_fields_memory_in_bytes" : 47619256,
        "term_vectors_memory_in_bytes" : 0,
        "norms_memory_in_bytes" : 21696640,
        "points_memory_in_bytes" : 3065192,
        "doc_values_memory_in_bytes" : 11573892,
        "index_writer_memory_in_bytes" : 6503754102,
        "version_map_memory_in_bytes" : 12366103,
        "fixed_bit_set_memory_in_bytes" : 0,
        "max_unsafe_auto_id_timestamp" : -1,
        "file_sizes" : { }
      },
      "translog" : {
        "operations" : 2133296,
        "size_in_bytes" : 114717685968,
        "uncommitted_operations" : 1159571,
        "uncommitted_size_in_bytes" : 62252691813
      },
      "request_cache" : {
        "memory_size_in_bytes" : 0,
        "evictions" : 0,
        "hit_count" : 386,
        "miss_count" : 1214
      },
      "recovery" : {
        "current_as_source" : 0,
        "current_as_target" : 0,
        "throttle_time_in_millis" : 0
      }
    }
  },
  "indices" : {
    "icc_case_notes_crpte" : {
      "primaries" : {
        "docs" : {
          "count" : 55543556,
          "deleted" : 47781
        },
        "store" : {
          "size_in_bytes" : 2672291755255
        },
        "indexing" : {
          "index_total" : 4120398,
          "index_time_in_millis" : 50952877,
          "index_current" : 4,
          "index_failed" : 0,
          "delete_total" : 0,
          "delete_time_in_millis" : 0,
          "delete_current" : 0,
          "noop_update_total" : 0,
          "is_throttled" : false,
          "throttle_time_in_millis" : 0
        },
        "get" : {
          "total" : 1,
          "time_in_millis" : 1,
          "exists_total" : 0,
          "exists_time_in_millis" : 0,
          "missing_total" : 1,
          "missing_time_in_millis" : 1,
          "current" : 0
        },
        "search" : {
          "open_contexts" : 0,
          "query_total" : 1600,
          "query_time_in_millis" : 288,
          "query_current" : 0,
          "fetch_total" : 0,
          "fetch_time_in_millis" : 0,
          "fetch_current" : 0,
          "scroll_total" : 0,
          "scroll_time_in_millis" : 0,
          "scroll_current" : 0,
          "suggest_total" : 0,
          "suggest_time_in_millis" : 0,
          "suggest_current" : 0
        },
        "merges" : {
          "current" : 22,
          "current_docs" : 1629270,
          "current_size_in_bytes" : 92451402153,
          "total" : 3515,
          "total_time_in_millis" : 789186847,
          "total_docs" : 18493872,
          "total_size_in_bytes" : 1133936081916,
          "total_stopped_time_in_millis" : 0,
          "total_throttled_time_in_millis" : 141446873,
          "total_auto_throttle_in_bytes" : 1505047452
        },
        "refresh" : {
          "total" : 29742,
          "total_time_in_millis" : 60025231,
          "listeners" : 0
        },
        "flush" : {
          "total" : 600,
          "total_time_in_millis" : 1905576
        },
        "warmer" : {
          "current" : 0,
          "total" : 26348,
          "total_time_in_millis" : 2047
        },
        "query_cache" : {
          "memory_size_in_bytes" : 0,
          "total_count" : 0,
          "hit_count" : 0,
          "miss_count" : 0,
          "cache_size" : 0,
          "cache_count" : 0,
          "evictions" : 0
        },
        "fielddata" : {
          "memory_size_in_bytes" : 0,
          "evictions" : 0
        },
        "completion" : {
          "size_in_bytes" : 0
        },
        "segments" : {
          "count" : 5537,
          "memory_in_bytes" : 12468548298,
          "terms_memory_in_bytes" : 12384593318,
          "stored_fields_memory_in_bytes" : 47619256,
          "term_vectors_memory_in_bytes" : 0,
          "norms_memory_in_bytes" : 21696640,
          "points_memory_in_bytes" : 3065192,
          "doc_values_memory_in_bytes" : 11573892,
          "index_writer_memory_in_bytes" : 6503754102,
          "version_map_memory_in_bytes" : 12366103,
          "fixed_bit_set_memory_in_bytes" : 0,
          "max_unsafe_auto_id_timestamp" : -1,
          "file_sizes" : { }
        },
        "translog" : {
          "operations" : 2133296,
          "size_in_bytes" : 114717685968,
          "uncommitted_operations" : 1159571,
          "uncommitted_size_in_bytes" : 62252691813
        },
        "request_cache" : {
          "memory_size_in_bytes" : 0,
          "evictions" : 0,
          "hit_count" : 386,
          "miss_count" : 1214
        },
        "recovery" : {
          "current_as_source" : 0,
          "current_as_target" : 0,
          "throttle_time_in_millis" : 0
        }
      },
      "total" : {
        "docs" : {
          "count" : 55543556,
          "deleted" : 47781
        },
        "store" : {
          "size_in_bytes" : 2672291755255
        },
        "indexing" : {
          "index_total" : 4120398,
          "index_time_in_millis" : 50952877,
          "index_current" : 4,
          "index_failed" : 0,
          "delete_total" : 0,
          "delete_time_in_millis" : 0,
          "delete_current" : 0,
          "noop_update_total" : 0,
          "is_throttled" : false,
          "throttle_time_in_millis" : 0
        },
        "get" : {
          "total" : 1,
          "time_in_millis" : 1,
          "exists_total" : 0,
          "exists_time_in_millis" : 0,
          "missing_total" : 1,
          "missing_time_in_millis" : 1,
          "current" : 0
        },
        "search" : {
          "open_contexts" : 0,
          "query_total" : 1600,
          "query_time_in_millis" : 288,
          "query_current" : 0,
          "fetch_total" : 0,
          "fetch_time_in_millis" : 0,
          "fetch_current" : 0,
          "scroll_total" : 0,
          "scroll_time_in_millis" : 0,
          "scroll_current" : 0,
          "suggest_total" : 0,
          "suggest_time_in_millis" : 0,
          "suggest_current" : 0
        },
        "merges" : {
          "current" : 22,
          "current_docs" : 1629270,
          "current_size_in_bytes" : 92451402153,
          "total" : 3515,
          "total_time_in_millis" : 789186847,
          "total_docs" : 18493872,
          "total_size_in_bytes" : 1133936081916,
          "total_stopped_time_in_millis" : 0,
          "total_throttled_time_in_millis" : 141446873,
          "total_auto_throttle_in_bytes" : 1505047452
        },
        "refresh" : {
          "total" : 29742,
          "total_time_in_millis" : 60025231,
          "listeners" : 0
        },
        "flush" : {
          "total" : 600,
          "total_time_in_millis" : 1905576
        },
        "warmer" : {
          "current" : 0,
          "total" : 26348,
          "total_time_in_millis" : 2047
        },
        "query_cache" : {
          "memory_size_in_bytes" : 0,
          "total_count" : 0,
          "hit_count" : 0,
          "miss_count" : 0,
          "cache_size" : 0,
          "cache_count" : 0,
          "evictions" : 0
        },
        "fielddata" : {
          "memory_size_in_bytes" : 0,
          "evictions" : 0
        },
        "completion" : {
          "size_in_bytes" : 0
        },
        "segments" : {
          "count" : 5537,
          "memory_in_bytes" : 12468548298,
          "terms_memory_in_bytes" : 12384593318,
          "stored_fields_memory_in_bytes" : 47619256,
          "term_vectors_memory_in_bytes" : 0,
          "norms_memory_in_bytes" : 21696640,
          "points_memory_in_bytes" : 3065192,
          "doc_values_memory_in_bytes" : 11573892,
          "index_writer_memory_in_bytes" : 6503754102,
          "version_map_memory_in_bytes" : 12366103,
          "fixed_bit_set_memory_in_bytes" : 0,
          "max_unsafe_auto_id_timestamp" : -1,
          "file_sizes" : { }
        },
        "translog" : {
          "operations" : 2133296,
          "size_in_bytes" : 114717685968,
          "uncommitted_operations" : 1159571,
          "uncommitted_size_in_bytes" : 62252691813
        },
        "request_cache" : {
          "memory_size_in_bytes" : 0,
          "evictions" : 0,
          "hit_count" : 386,
          "miss_count" : 1214
        },
        "recovery" : {
          "current_as_source" : 0,
          "current_as_target" : 0,
          "throttle_time_in_millis" : 0
        }
      }
    }
  }
}

The Top shows the following

top - 15:17:00 up 12 days, 20:33,  3 users,  load average: 2.68, 3.02, 3.19
Tasks: 431 total,   1 running, 241 sleeping,   0 stopped,   0 zombie
%Cpu(s):  3.1 us,  0.0 sy,  0.0 ni, 96.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 26338697+total,  3445884 free, 37108092 used, 22283299+buff/cache
KiB Swap:  8388604 total,  8312060 free,    76544 used. 22385448+avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
24185 psoft     20   0 2200.6g 155.1g 121.7g S 100.0 61.7   3314:20 java
30155 psoft     20   0  174556   4984   3988 R   0.3  0.0   0:00.04 top

Wondering whats causing the indexing rate to slow down and what can be done to improve it?

It sounds like you are indexing into a single index with 200 primary shards, is that correct?

What is the average size of your documents? What bulk size are you using?

Given that I also see deletes in the index stats I assume the application is assigning document IDs.

Once you have indexed these documents, will you be updating or deleting them by this ID?

You mentioned that you have 10TB of storage. What type of storage is this? Can you run iostat -x on the node so we can see what await and disk utilisation looks like?

Thank you Christian for your response.

Yes, it is a single index with 200 primary shards. As this is the first time, we are doing full indexing, the assumption was that to index 114 million Oracle data rows plus 150 mil (child records of 114mil) attachments, ES may need anywhere from 6 to 9TB, hence trying to allocate between 20 to 50GB to each shard leaving enough space for growth going forward.

ES currently indexed 57995807 documents with total disk utilisation of 2.4T, so i would think the average size of each document would be ~42kb.

The application would be assigning document ID's. Once we fully index, we will be running periodical incremental update process to reindex any updates made to these documents as well as indexing any new documents that are going to be added going forward.

PeopleSoft delivers a program to handle all of these different scenarios and i am unsure how and why there are deletes in the indexed documents at this stage as this is first time we are doing as part of assessing the feasibility of this solution as part of POC with limited previous knowledge and experience in ES.

Both application and ES servers are hosted in Oracle OCI and i would have to check System admins about the storage type and get back to you.

The iostat -x shows the following. and we can await in sdc manily

iostat -x

Linux 5.4.17-2136.336.5.1.el7uek.x86_64 (preprsydelsiccms01)    12/12/2024      _x86_64_        (32 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           7.36    0.00    1.23    0.78    0.02   90.61

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     0.26    1.63    1.71    39.30    19.14    34.98     0.00    0.76    0.68    0.83 0.5069   0.17
sdb               0.00     0.00    0.01    0.00     0.13     0.00    35.20     0.00    0.53    0.52    0.78 0.6197   0.00
sdc               0.53   457.10   88.74  174.65 10824.05 35035.96   348.22     0.97    3.68    2.79    4.13 0.4540  11.96

The thread pool and tasks shows the following.
_cat/thread_pool?v=true&h=id,name,queue,active,rejected,completed&pretty"

id                     name                queue active rejected completed
RLskF8A6SiGwd9o9mu1o6w bulk                    0      5    41414   2267966
RLskF8A6SiGwd9o9mu1o6w fetch_shard_started     0      0        0       220
RLskF8A6SiGwd9o9mu1o6w fetch_shard_store       0      0        0         0
RLskF8A6SiGwd9o9mu1o6w flush                   0      0        0      6612
RLskF8A6SiGwd9o9mu1o6w force_merge             0      0        0         0
RLskF8A6SiGwd9o9mu1o6w generic                 0      0        0     27875
RLskF8A6SiGwd9o9mu1o6w get                     0      0        0         2
RLskF8A6SiGwd9o9mu1o6w index                   0      0        0         0
RLskF8A6SiGwd9o9mu1o6w listener                0      0        0         0
RLskF8A6SiGwd9o9mu1o6w management              0      3        0      9586
RLskF8A6SiGwd9o9mu1o6w refresh                 0      1        0     59165
RLskF8A6SiGwd9o9mu1o6w search                  0      0        0      4555
RLskF8A6SiGwd9o9mu1o6w snapshot                0      0        0         0
RLskF8A6SiGwd9o9mu1o6w warmer                  0      0        0         0

_tasks?group_by=parents&pretty=true"

{
  "tasks" : {
    "RLskF8A6SiGwd9o9mu1o6w:4552636" : {
      "node" : "RLskF8A6SiGwd9o9mu1o6w",
      "id" : 4552636,
      "type" : "transport",
      "action" : "indices:data/write/bulk",
      "start_time_in_millis" : 1733947950584,
      "running_time_in_nanos" : 24251871581,
      "cancellable" : false
    },
    "RLskF8A6SiGwd9o9mu1o6w:4552761" : {
      "node" : "RLskF8A6SiGwd9o9mu1o6w",
      "id" : 4552761,
      "type" : "transport",
      "action" : "cluster:monitor/tasks/lists",
      "start_time_in_millis" : 1733947974831,
      "running_time_in_nanos" : 5630115,
      "cancellable" : false,
      "children" : [
        {
          "node" : "RLskF8A6SiGwd9o9mu1o6w",
          "id" : 4552762,
          "type" : "direct",
          "action" : "cluster:monitor/tasks/lists[n]",
          "start_time_in_millis" : 1733947974834,
          "running_time_in_nanos" : 2483340,
          "cancellable" : false,
          "parent_task_id" : "RLskF8A6SiGwd9o9mu1o6w:4552761"
        }
      ]
    },
    "RLskF8A6SiGwd9o9mu1o6w:4552759" : {
      "node" : "RLskF8A6SiGwd9o9mu1o6w",
      "id" : 4552759,
      "type" : "transport",
      "action" : "indices:data/write/bulk",
      "start_time_in_millis" : 1733947963474,
      "running_time_in_nanos" : 11362215884,
      "cancellable" : false
    },
    "RLskF8A6SiGwd9o9mu1o6w:4552760" : {
      "node" : "RLskF8A6SiGwd9o9mu1o6w",
      "id" : 4552760,
      "type" : "transport",
      "action" : "indices:data/write/bulk",
      "start_time_in_millis" : 1733947963816,
      "running_time_in_nanos" : 11019716741,
      "cancellable" : false
    },
    "RLskF8A6SiGwd9o9mu1o6w:4552637" : {
      "node" : "RLskF8A6SiGwd9o9mu1o6w",
      "id" : 4552637,
      "type" : "transport",
      "action" : "indices:data/write/bulk",
      "start_time_in_millis" : 1733947950848,
      "running_time_in_nanos" : 23988536244,
      "cancellable" : false
    },
    "RLskF8A6SiGwd9o9mu1o6w:4552638" : {
      "node" : "RLskF8A6SiGwd9o9mu1o6w",
      "id" : 4552638,
      "type" : "transport",
      "action" : "indices:data/write/bulk",
      "start_time_in_millis" : 1733947951085,
      "running_time_in_nanos" : 23751574908,
      "cancellable" : false
    }
  }
}

The lsblk shows the following and ES data being stored in sdc with RO type 0, so i would thin it is SSD

lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sdb 8:16 0 100G 0 disk /psoft
sdc 8:32 0 10T 0 disk /rajaels
sda 8:0 0 100G 0 disk
├─sda2 8:2 0 8G 0 part [SWAP]
├─sda3 8:3 0 91.8G 0 part /
└─sda1 8:1 0 200M 0 part /boot/efi

Also, the attachment handlers' size is set to 10 in PeopleSoft with Index segment size of 10MB and i am thinking of reducing attachment handlers' size to 5 to see if it helps in improving the indexing rate.

But not sure if i should cancel and update attachment handlers' size to 5 now or wait until the current batch process is complete, but based on the current rate, it would take days to complete the current batch.

At the same time, if i update this in between, without cancelling, not sure how does the system handles the change whilst it is running.

I can also see following frequent GC overhead messages in ESCLUSTER.LOG file.

[2024-12-12T08:17:39,450][INFO ][o.e.m.j.JvmGcMonitorService] [RLskF8A] [gc][42519] overhead, spent [340ms] collecting in the last [1.3s]

The through put slowed down further to ~1000/min and at current rate it will be another month or so to index remaining 45mil rows.

Wondering if there are changes to current config can be suggested to improve the throughput.

Hi

I am not seeing how from that output how you see the sdc disk is SSD.

The RO column is whether it is read-only, and it's not.

And even if it is "SSD", is it directly connected to the server, or presented via SAN to iSCSI or ... what?

sudo lsblk -S

or using lshw/hwinfo utilities might tell you/us more. Or just getting accurate info from the sysadmin.

But things getting slower are not necessarily anything to do with I/O to that specific SCSI device. The top output doesn't show massive load average, yet the java process is pegged at 100% CPU. btw it seems to suggest the system has 256GB of RAM, not the 128GB you reported.

In passing only. the 8G swap device likely does little harm, you are likely not swapping at all, but what's the point of it? It's presence more describes to me that you maybe did not read through the documentation fully -

And If you missed that bit, what else did you maybe miss?

1 Like

Please see output from

sudo lsblk -S
NAME HCTL TYPE VENDOR MODEL REV TRAN
sdb 6:0:0:1 disk ORACLE BlockVolume 1.0 iscsi
sdc 7:0:0:1 disk ORACLE BlockVolume 1.0 iscsi
sda 2:0:1:1 disk ORACLE BlockVolume 1.0

I just confirmed that the same type of disk is being utilised in all PROD servers as they all hosted in Oracle OCI.

and it 256GB RAM on the server, as it was increased to 256 to 128GB at some stage but not needed as of now.

I thought it was SSD as i read in one of the post if ROTA is 1 it is SSD.

The output of "lsblk -o name,rota | tail" show the following

NAME ROTA
sdb 1
sdc 1
sda 1
├─sda2 1
├─sda3 1
└─sda1 1

But further research indicates ROTA of 0 is SSD

Assigned Block volume details

So, it maybe is SSD-backed, but I dont know the Oracle cloud storage offerings. FYI it's delivered via iSCSI, though you have no control over that, but it's effectively a SCSI-over-network protocol. 25k IOPS / 480 MB/s and 256GB of RAM is pretty good for a PoC.

lsblk really cannot know for sure, so I would personally not put to much trust in its specific output for ROTA (rotational device) in a cloud environment - better just refer to the OCI docs.

The screenshot shows you chose a "Balanced Performance" device, so at least you have option to try higher performance offerings (that obviously cost more). The OCI docs for "Balanced Performance" says:

The Balanced performance level provides a good balance between performance and cost savings for most workloads, including those that perform random I/O such as boot volumes.

Others will know OCI cloud offerings better than me, but thats not the storage type I would go to first. Consider to at least try with different IO devices, it is a PoC after all.

You really need deeply analyze your system and see if it really is IO bound, with tools like vmstat, top, iostat, jvisualvm, and maybe others. As well as checking through all the Elastic docs, check mount options, get rid of the swap, etc.

You know better than us all the things that system does - is it really only running elasticsearch, and if so is the query load minor or significant? I am also not clear on what you call "attachments". Is the ingest process "simple", or are there anything complex (eg enrichments) going on? etc.

Maybe you should also consider getting a consultant/professional services, either via Elastic or elsewhere, to help you in this phase.

Wondering if migrating from a current single cluster, single node to 4 node environments with the following specs would improve the throughput.

Node Purpose OCPU Memory Storage
1 Data 16 64GB 5TB
2 Data 16 64GB 5TB
3 Master 4 64GB 100GB
4 Ingest 4 64GB 100GB

It is only running Elasticsearch, nothing else is hosted in this node.
There are no queries/searches running as well as i am still trying to fully index all the documents.

The application has data that i am trying to index also contains attachments (.pdf, .doc, .png etc) which are stored in the oracle DB and application runs a process to extract these from DB and transfer to ES for indexing.

there are no enrichments that i am aware of, but ingesting and indexing 41million attachments (.doc, .pdf etc) can possibly be complex, resource intensive and time consuming.

If there is no clear bottleneck in Elasticsearch, e.g. CPU usage or disk IO, I would recommend looking at the data extraction from Oracle together with the ingest processing. I have on numerous occasions had users complain about Elasticsearch performance, just to later identify that it was the extraction from the relational database or the processing pipeline that was the actual bottleneck. If you change your data pipeline to instead write to disk - does that actually eliminate the bottleneck or does it not have any impact at all? This simple test should quickly tell you whether Elasticsearch performance is actually the issue.

I can see very frequent and very high number of the following GC warning messages, which may be bottleneck itself leaving very little resources for indexing process.

"[2024-12-18T13:53:36,796][INFO ][o.e.m.j.JvmGcMonitorService[RLskF8A] [gc][43329] overhead, spent [726ms] collecting in the last [1s]

The question is why is it JVM GC spending this time, when it had 31667m heap allocated with 256GB memory on node itself.

If this is the bottle neck, what are the options, i have to increase through put?

Is increasing number of nodes in a cluster is the option or are there any other alternatives?

and also to add what i just said, the same data that is being indexed at the moment, indexed quickly at the rate of 40000/min in earlier runs when i ran it as part of first batch out of many batches in earlier runs...i.e data extraction from oracle is not actually causing this issue as it proved before that it can extract and index this set of data successfully at 40000/min.

If you are seeing warnings round GC that would be a problem. You may need to increase the amount of heap, e.g. by adding additional nodes, or maybe try moving any enrichment you perform within ingest pipelines outside of Elasticsearch.

Good point. Do all batches extract and transform the same type and mix of data? Are they all the same size so deep paging or something like that might have an impact?

The question is why is it JVM GC spending this time, when it had 31667m heap allocated with 256GB memory on node itself.

Earlier you wrote:

jvm.options
-Xms31g
-Xmx31g

The Xmx setting there is the maximum heap size for the JVM.

btw the top output above showed a RES value for the java process of 151G, and SHR as 121G. I am going to assume that java process IS the elasticsearc process, and not something else you maybe missed. That's way bigger than the 31g allocated for heap, so would be nice to understand where all the other memory is being used. I'd also be interested to know if these values are slowly increasing. Maybe it "resets" itself if you simply restart the elasticsearch process ?

Have you did the deeper analysis with tools vmstat, top, iostat, jvisualvm?

I remain a little confused with some details around these "attachments" which are pdf/doc files? You are indexing these files into Elasticsearch via:

I've never used that, but my guess is it would have variable performance depending on the specifics of the documents it has to crawl through, larger docs would take longer to process.

Sorry, another thought:

You mention Peoplesoft a bit. I am guessing you are using tooling that is provided by them.

PeopleSoft delivers a program to handle all of these different scenarios.

The attachment handlers' size is set to 10 in PeopleSoft with Index segment size of 10MB and i am thinking of reducing attachment handlers' size to 5 to see if it helps in improving the indexing rate.

150mil attachments from a PeopleSoft application to ES 6.1 (due to compatibility with PeopleSoft, can't upgrade to higher versions)

PeopleSoft are your vendor, they have likely being well paid, you are allowed to ask them questions, get their recommendations and experience too.

In passing, I also note that elasticsearch 6.1 was released around 6 years ago, and there were tons of later releases in the 6.x series up to 6.8.23, which is only around 3 years old.

1 Like

All batches extract and transform same type of data, but sizing and profiling (% of OLTP transactional vs OLTP attachment data extracted from oracle tables) might be slightly different depending on how many attachments added by end user to each of the OLTP data row. i.e %attachments might be higher in current batches i am indexing right now compared to the ones already indexed.

But it is important to note that this very same that i am trying to index right now with very low throughout was indexed in earlier runs at 40000/min when this batch is the first one t index instead of last one.

i.e when i indexed previously, there is no data indexed at all in elastic and nothing in shards. the difference now is that elastic had to add this data on top of 70mil documents it already indexed on 200 shards for this index.

Although profiling is slightly different the sizing might not be drastically different compared to earlier batches.

Basically its 25 years' worth of client notes and any attachments added as an additional information to support the client notes. i have indexed for 20 years data (70mil +) and, now trying to index last 5 years data (45mil+) .

Note in these total 115mil documents, also contain attachment data.