Excessive merging/small segment sizes

The problem reappeared. We did some tests today around copying a large file
on the nodes to test i/o throughput. On the loaded node, the copy was
really slow, maybe 30x slower. So it seems your suspicion around something
external interfering with I/O was right in the end even though nothing else
is running on the machines. We will investigate our setup further but this
doesn't seem like a lucene/elasticsearch issue in the end.

For the index close, I didn't issue any command, elasticsearch seemed to do
that on its own. The code is in IndexingMemoryController. The triggering
event seems to be the ram buffer size change, this triggers a call to
InternalEngine.updateIndexingBufferSize() which then calls flush with type
NEW_WRITER. That seems to close the lucene IndexWriter.

Thanks!
Kireet

On Friday, July 11, 2014 4:13:52 AM UTC-7, Michael McCandless wrote:

Hmm this is interesting: it looks like, after 30 minutes of inactivity, ES
drops the RAM buffer size for this index to tiny value (500 KB vs 500 MB
before), but then, the IW is closed (I don't think ES does this by default,
but I'm not sure: do you have a client that may have closed the index?).

Then, right after that (~200 msec later) it opens the IndexWriter again,
and calls getReader().

That new getReader is expected to be costly (13 seconds) because the IW
must open every segment, vs the more typical case where there was already
an NRT reader open and now, having written a few new segments or completed
a merge or two, it only needs to open those new segments.

So the question is why is this IW closed and immediately opened again?

Mike McCandless

http://blog.mikemccandless.com

On Thu, Jul 10, 2014 at 8:12 PM, Kireet Reddy <kir...@feedly.com
<javascript:>> wrote:

The below log snippet looked strange to me. It occurs after indexing has
stopped for awhile. It looks like elasticsearch closes the index for
writing (?) and that triggers some IndexWriter operation. The getReader
message reports it took 13 seconds, which seems like a long time. Nothing
else is really going on at this time, no indexing, heavy gc, etc. The same
call for this index on the replica took 2 seconds.

I am trying to follow the code, it seems like since packetCount=0 here,
the prune() method doesn't do anything. The only other thing of
significance in between that message and the "return reader ..." message is
the StandardDirectoryReader.open(this, segmentInfos, applyAllDeletes);
which unfortunately doesn't have any log messages. The getReader call on
the replica took 2 seconds on a healthy node. It happened around the same
time. Does this provide any clues? Like everything else, the getReader
calls seem to take much longer on the loaded node.

[2014-07-10 08:17:58,281][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
BD: prune sis=org.apache.lucene.index.SegmentInfos@39c19aec minGen=0
packetCount=0
[2014-07-10 08:18:11,359][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IW: return reader version=155332
reader=StandardDirectoryReader(segments_la:155332:nrt
_thy(4.8):C7093414/62450:delGen=393 _cfi(4.7):C4517193/37932:delGen=622
_mrx(4.8):C5257709/46074:delGen=509 _zqo(4.8):C4946049/46139:delGen=293
_zte(4.8):C428210/13793:delGen=267 _10jx(4.8):C562710/16328:delGen=266
_11aw(4.8):C554410/16793:delGen=266 _1245(4.8):C670233/18142:delGen=247
_131a(4.8):C742879/18107:delGen=224 _13tl(4.8):C595917/15685:delGen=215
_1aw7(4.8):C517272/3261:delGen=87 _167b(4.8):C97197/12459:delGen=171
_1amh(4.8):C15328/1829:delGen=87 _1a0t(4.8):C9524/1392:delGen=92
_1acr(4.8):C8324/1421:delGen=87 _1atp(4.8):C5689/566:delGen=69
_1avx(4.8):C5077/244:delGen=60 _1avn(4.8):C5561/304:delGen=48
_1emy(4.8):C10448/482:delGen=14 _1csj(4.8):C8313/907:delGen=48
_1bu3(4.8):C3357/896:delGen=61 _1d3d(4.8):C3442/643:delGen=42
_1e5f(4.8):C5228/494:delGen=22 _1ex8(4.8):C2400/220:delGen=7
_1f1e(4.8):C957/50:delGen=2 _1f29(4.8):C187/1:delGen=1 _1f2i(4.8):C147)
[2014-07-10 08:18:11,362][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
DW: elasticsearch[se5int-sv2][scheduler][T#1] finishFullFlush success=true
[2014-07-10 08:18:11,363][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IW: getReader took 13085 msec

Full log for that operation:

[2014-07-10 08:17:57,928][DEBUG][indices.memory ] [se5int-sv2]
marking shard [entry_20140702][0] as inactive (inactive_time[30m]) indexing
wise, setting size to [500kb]
[2014-07-10 08:17:57,929][DEBUG][index.engine.internal ] [se5int-sv2]
[entry_20140702][0] updating index_buffer_size from [512mb] to (inactive)
[500kb]
[2014-07-10 08:17:57,930][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IW: now flush at close waitForMerges=false
[2014-07-10 08:17:57,931][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IW: start flush: applyAllDeletes=true
[2014-07-10 08:17:57,933][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IW: index before flush _thy(4.8):C7093414/62450:delGen=393
_cfi(4.7):C4517193/37932:delGen=622 _mrx(4.8):C5257709/46074:delGen=509
_zqo(4.8):C4946049/46139:delGen=293 _zte(4.8):C428210/13793:delGen=267
_10jx(4.8):C562710/16328:delGen=266 _11aw(4.8):C554410/16793:delGen=266
_1245(4.8):C670233/18142:delGen=247 _131a(4.8):C742879/18107:delGen=224
_13tl(4.8):C595917/15685:delGen=215 _1aw7(4.8):C517272/3261:delGen=87
_167b(4.8):C97197/12459:delGen=171 _1amh(4.8):C15328/1829:delGen=87
_1a0t(4.8):C9524/1392:delGen=92 _1acr(4.8):C8324/1421:delGen=87
_1atp(4.8):C5689/566:delGen=69 _1avx(4.8):C5077/244:delGen=60
_1avn(4.8):C5561/304:delGen=48 _1emy(4.8):C10448/482:delGen=14
_1csj(4.8):C8313/907:delGen=48 _1bu3(4.8):C3357/896:delGen=61
_1d3d(4.8):C3442/643:delGen=42 _1e5f(4.8):C5228/494:delGen=22
_1ex8(4.8):C2400/220:delGen=7 _1f1e(4.8):C957/50:delGen=2
_1f29(4.8):C187/1:delGen=1 _1f2i(4.8):C147
[2014-07-10 08:17:57,934][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
DW: startFullFlush
[2014-07-10 08:17:57,934][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
DW: anyChanges? numDocsInRam=0 deletes=false hasTickets:false
pendingChangesInFullFlush: false
[2014-07-10 08:17:57,936][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
DW: elasticsearch[se5int-sv2][scheduler][T#1] finishFullFlush success=true
[2014-07-10 08:17:57,936][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IW: apply all deletes during flush
[2014-07-10 08:17:57,938][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
BD: applyDeletes: no deletes; skipping
[2014-07-10 08:17:57,938][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
BD: prune sis=org.apache.lucene.index.SegmentInfos@1b8d1c15 minGen=76876
packetCount=0
[2014-07-10 08:17:57,939][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IW: all running merges have aborted
[2014-07-10 08:17:57,940][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IW: now call final commit()
[2014-07-10 08:17:57,941][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IW: commit: start
[2014-07-10 08:17:57,941][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IW: commit: enter lock
[2014-07-10 08:17:57,942][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IW: commit: now prepare
[2014-07-10 08:17:57,942][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IW: prepareCommit: flush
[2014-07-10 08:17:57,943][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IW: index before flush _thy(4.8):C7093414/62450:delGen=393
_cfi(4.7):C4517193/37932:delGen=622 _mrx(4.8):C5257709/46074:delGen=509
_zqo(4.8):C4946049/46139:delGen=293 _zte(4.8):C428210/13793:delGen=267
_10jx(4.8):C562710/16328:delGen=266 _11aw(4.8):C554410/16793:delGen=266
_1245(4.8):C670233/18142:delGen=247 _131a(4.8):C742879/18107:delGen=224
_13tl(4.8):C595917/15685:delGen=215 _1aw7(4.8):C517272/3261:delGen=87
_167b(4.8):C97197/12459:delGen=171 _1amh(4.8):C15328/1829:delGen=87
_1a0t(4.8):C9524/1392:delGen=92 _1acr(4.8):C8324/1421:delGen=87
_1atp(4.8):C5689/566:delGen=69 _1avx(4.8):C5077/244:delGen=60
_1avn(4.8):C5561/304:delGen=48 _1emy(4.8):C10448/482:delGen=14
_1csj(4.8):C8313/907:delGen=48 _1bu3(4.8):C3357/896:delGen=61
_1d3d(4.8):C3442/643:delGen=42 _1e5f(4.8):C5228/494:delGen=22
_1ex8(4.8):C2400/220:delGen=7 _1f1e(4.8):C957/50:delGen=2
_1f29(4.8):C187/1:delGen=1 _1f2i(4.8):C147
[2014-07-10 08:17:57,943][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
DW: startFullFlush
[2014-07-10 08:17:57,944][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
DW: anyChanges? numDocsInRam=0 deletes=false hasTickets:false
pendingChangesInFullFlush: false
[2014-07-10 08:17:57,945][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IW: apply all deletes during flush
[2014-07-10 08:17:57,945][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
BD: applyDeletes: no deletes; skipping
[2014-07-10 08:17:57,946][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
BD: prune sis=org.apache.lucene.index.SegmentInfos@1b8d1c15 minGen=76876
packetCount=0
[2014-07-10 08:17:57,953][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
DW: elasticsearch[se5int-sv2][scheduler][T#1] finishFullFlush success=true
[2014-07-10 08:17:57,953][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IW: startCommit(): start
[2014-07-10 08:17:57,954][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IW: skip startCommit(): no changes pending
[2014-07-10 08:17:57,955][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IW: commit: pendingCommit == null; skip
[2014-07-10 08:17:57,956][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IW: commit: done
[2014-07-10 08:17:57,963][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IW: at close: _thy(4.8):C7093414/62450:delGen=393
_cfi(4.7):C4517193/37932:delGen=622 _mrx(4.8):C5257709/46074:delGen=509
_zqo(4.8):C4946049/46139:delGen=293 _zte(4.8):C428210/13793:delGen=267
_10jx(4.8):C562710/16328:delGen=266 _11aw(4.8):C554410/16793:delGen=266
_1245(4.8):C670233/18142:delGen=247 _131a(4.8):C742879/18107:delGen=224
_13tl(4.8):C595917/15685:delGen=215 _1aw7(4.8):C517272/3261:delGen=87
_167b(4.8):C97197/12459:delGen=171 _1amh(4.8):C15328/1829:delGen=87
_1a0t(4.8):C9524/1392:delGen=92 _1acr(4.8):C8324/1421:delGen=87
_1atp(4.8):C5689/566:delGen=69 _1avx(4.8):C5077/244:delGen=60
_1avn(4.8):C5561/304:delGen=48 _1emy(4.8):C10448/482:delGen=14
_1csj(4.8):C8313/907:delGen=48 _1bu3(4.8):C3357/896:delGen=61
_1d3d(4.8):C3442/643:delGen=42 _1e5f(4.8):C5228/494:delGen=22
_1ex8(4.8):C2400/220:delGen=7 _1f1e(4.8):C957/50:delGen=2
_1f29(4.8):C187/1:delGen=1 _1f2i(4.8):C147
[2014-07-10 08:17:58,157][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IFD: init: current segments file is "segments_la";
deletionPolicy=org.elasticsearch.index.deletionpolicy.SnapshotDeletionPolicy@667b43fc
[2014-07-10 08:17:58,161][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IFD: init: load commit "segments_la"
[2014-07-10 08:17:58,215][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IFD: init: removing unreferenced file "_1ex8_6.del"
[2014-07-10 08:17:58,216][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IFD: delete "_1ex8_6.del"
[2014-07-10 08:17:58,219][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IFD: init: removing unreferenced file "_thy_aw.del"
[2014-07-10 08:17:58,220][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IFD: delete "_thy_aw.del"
[2014-07-10 08:17:58,236][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IFD: init: removing unreferenced file "_1aw7_2e.del"
[2014-07-10 08:17:58,237][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IFD: delete "_1aw7_2e.del"
[2014-07-10 08:17:58,240][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IFD: init: removing unreferenced file "_1emy_d.del"
[2014-07-10 08:17:58,241][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IFD: delete "_1emy_d.del"
[2014-07-10 08:17:58,242][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IFD: init: removing unreferenced file "_cfi_h9.del"
[2014-07-10 08:17:58,243][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IFD: delete "_cfi_h9.del"
[2014-07-10 08:17:58,251][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IFD: init: removing unreferenced file "_mrx_e4.del"
[2014-07-10 08:17:58,252][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IFD: delete "_mrx_e4.del"
[2014-07-10 08:17:58,262][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IFD: init: removing unreferenced file "_1f1e_1.del"
[2014-07-10 08:17:58,263][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IFD: delete "_1f1e_1.del"
[2014-07-10 08:17:58,266][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IFD: now checkpoint "_thy(4.8):C7093414/62450:delGen=393
_cfi(4.7):C4517193/37932:delGen=622 _mrx(4.8):C5257709/46074:delGen=509
_zqo(4.8):C4946049/46139:delGen=293 _zte(4.8):C428210/13793:delGen=267
_10jx(4.8):C562710/16328:delGen=266 _11aw(4.8):C554410/16793:delGen=266
_1245(4.8):C670233/18142:delGen=247 _131a(4.8):C742879/18107:delGen=224
_13tl(4.8):C595917/15685:delGen=215 _1aw7(4.8):C517272/3261:delGen=87
_167b(4.8):C97197/12459:delGen=171 _1amh(4.8):C15328/1829:delGen=87
_1a0t(4.8):C9524/1392:delGen=92 _1acr(4.8):C8324/1421:delGen=87
_1atp(4.8):C5689/566:delGen=69 _1avx(4.8):C5077/244:delGen=60
_1avn(4.8):C5561/304:delGen=48 _1emy(4.8):C10448/482:delGen=14
_1csj(4.8):C8313/907:delGen=48 _1bu3(4.8):C3357/896:delGen=61
_1d3d(4.8):C3442/643:delGen=42 _1e5f(4.8):C5228/494:delGen=22
_1ex8(4.8):C2400/220:delGen=7 _1f1e(4.8):C957/50:delGen=2
_1f29(4.8):C187/1:delGen=1 _1f2i(4.8):C147" [27 segments ; isCommit = false]
[2014-07-10 08:17:58,272][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IFD: 7 msec to checkpoint
[2014-07-10 08:17:58,273][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IW: init: create=false
[2014-07-10 08:17:58,277][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IW:
[2014-07-10 08:17:58,278][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IW: flush at getReader
[2014-07-10 08:17:58,279][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
DW: startFullFlush
[2014-07-10 08:17:58,279][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
DW: anyChanges? numDocsInRam=0 deletes=false hasTickets:false
pendingChangesInFullFlush: false
[2014-07-10 08:17:58,280][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IW: apply all deletes during flush
[2014-07-10 08:17:58,280][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
BD: applyDeletes: no deletes; skipping
[2014-07-10 08:17:58,281][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
BD: prune sis=org.apache.lucene.index.SegmentInfos@39c19aec minGen=0
packetCount=0
[2014-07-10 08:18:11,359][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IW: return reader version=155332
reader=StandardDirectoryReader(segments_la:155332:nrt
_thy(4.8):C7093414/62450:delGen=393 _cfi(4.7):C4517193/37932:delGen=622
_mrx(4.8):C5257709/46074:delGen=509 _zqo(4.8):C4946049/46139:delGen=293
_zte(4.8):C428210/13793:delGen=267 _10jx(4.8):C562710/16328:delGen=266
_11aw(4.8):C554410/16793:delGen=266 _1245(4.8):C670233/18142:delGen=247
_131a(4.8):C742879/18107:delGen=224 _13tl(4.8):C595917/15685:delGen=215
_1aw7(4.8):C517272/3261:delGen=87 _167b(4.8):C97197/12459:delGen=171
_1amh(4.8):C15328/1829:delGen=87 _1a0t(4.8):C9524/1392:delGen=92
_1acr(4.8):C8324/1421:delGen=87 _1atp(4.8):C5689/566:delGen=69
_1avx(4.8):C5077/244:delGen=60 _1avn(4.8):C5561/304:delGen=48
_1emy(4.8):C10448/482:delGen=14 _1csj(4.8):C8313/907:delGen=48
_1bu3(4.8):C3357/896:delGen=61 _1d3d(4.8):C3442/643:delGen=42
_1e5f(4.8):C5228/494:delGen=22 _1ex8(4.8):C2400/220:delGen=7
_1f1e(4.8):C957/50:delGen=2 _1f29(4.8):C187/1:delGen=1 _1f2i(4.8):C147)
[2014-07-10 08:18:11,362][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
DW: elasticsearch[se5int-sv2][scheduler][T#1] finishFullFlush success=true
[2014-07-10 08:18:11,363][TRACE][lucene.iw ]
[se5int-sv2][entry_20140702][0] elasticsearch[se5int-sv2][scheduler][T#1]
IW: getReader took 13085 msec

On Thursday, July 10, 2014 12:28:32 PM UTC-7, Kireet Reddy wrote:

we had an unrelated issue this morning where we had to stop indexing for
awhile. The problematic node remained busy for 1 hour afterwards, but then
became idle. We then restarted indexing and the node was fine. We didn't
restart the jvm, it seemed like something recovered and the node was able
to operate normally afterwards. It's very strange. I have copied the logs
for the time period here
https://www.dropbox.com/s/gb6xfxldryq80il/nodelogs.zip (has log for
problematic node5 and "normal" node6). Indexing stopped around 7:22, but
the node didn't seem to recover until 8:25. Unfortunately I don't have any
hot threads output during that time. I went through all our metrics, pretty
much all metric activity stops at 7:25 (see graphs). The only thing that
didn't was the merge total docs counter. I don't know why this would ever
decrease, but I have seen decreases in the past as well.

Not sure what to try at this point. The "custom" things I do are:

  • language specific analysis using packaged tokenizers/filters and
    ICU, smart-cn, kuromoji, and stempel plugins
  • have eager field data loading turned on for a long field. However
    the field data cache size seems ok, it was around 600MB, the breaker size
    is 3.6GB. Also the size seemed to be around 100MB the last time the problem
    happened so doesn't seem to be related.
  • have a fair number of different indices (we use the index per time
    period pattern).
  • create/update documents via the bulk endpoint

Everything else seems pretty much by the book. It seems like I could stop
using the language specific analyzers or do the eager field data loading.
But both of those seem unlikely to be the issue to me. If it was an
analyzer, this should happen for the replica index as well. If it was field
data, the problem shouldn't go away unless we reduce the data size.

answers to your questions inline below. Thanks for the help! It's very
much appreciated. This is a really tough one...
On Thursday, July 10, 2014 2:59:40 AM UTC-7, Michael McCandless wrote:

Another question: have you disabled merge throttling? And, which version
of ES are you using?

we are using 1.2.1 (we upgraded to get the info stream logs). merge
throttling is enabled to default settings but doesn't seem to be an issue
(see graph)

Mike McCandless

http://blog.mikemccandless.com

On Thu, Jul 10, 2014 at 5:49 AM, Michael McCandless <
mi...@elasticsearch.com> wrote:

Indeed the hot threads on node5 didn't reveal anything unexpected: they
are busy merging and refreshing. One thread was in SmartChineseAnalyzer...

Very strange that young gen GC is suddenly so costly. This is much more
time spent in GC than you ever see on healthy nodes?

Yes, I see monitor.jvm lines in the log like:

[gc][young][593516][88318] duration [2.2s], collections [1]/[11.8s],
total [2.2s]/[3.6h], memory [17.2gb]->[16.8gb]/[31.9gb], all_pools {[young]
[631.6mb]->[61.6mb]/[665.6mb]}{[survivor] [83.1mb]->[83.1mb]/[83.1mb]}{[old]
[16.5gb]->[16.7gb]/[31.1gb]}
It's always the young gen, and seems to be every 5-10 seconds, with
duration being a few hundred ms to 3 seconds. I don't see any lines in the
log files of other nodes. I changed the config so that I get debug lines
for all collections instead of just slow ones. I'll try to confirm that the
collections are just slower and not more frequent on the loaded node.

If you restart the JVM on that node, does the problem immediately start
happening again? (Hmm: does restarting the JVM cause the others to start
building new replicas... which would mess up this experiment). Separately,
if you reboot the node and bring it back up, does the problem start
happening again?

stopping indexing for an hour seemed to "fix" the issue without a jvm
restart. we also just restarted our cluster to add the extra gc logging,
things are fine thus far.

Does vmstat/iostat reveal anything? E.g. is the OS doing a lot of
swapping or something? Do you mlockall your JVM heap (bootstrap.mlockall)?
What is swappiness set to?

mlockall is true, there is no swap used. everything seems normal at the
os level, low i/o 50% cpu usage, acceptable file descriptors, etc..

Which Java / OS versions are you using?

java: 1.7.0_55
OS: linux centos 6.4 64bit

Mike McCandless

http://blog.mikemccandless.com

On Wed, Jul 9, 2014 at 5:13 PM, Kireet Reddy kir...@feedly.com wrote:

Sorry, here it is:

Dropbox - Error - Simplify your life

Also a couple of graphs of the memory usage.

On Wednesday, July 9, 2014 2:10:49 PM UTC-7, Michael McCandless wrote:

Hmm link doesn't seem to work?

Mike McCandless

http://blog.mikemccandless.com

On Wed, Jul 9, 2014 at 4:06 PM, Kireet Reddy kir...@feedly.com wrote:

Sorry, forgot the link

Dropbox - Error - Simplify your life
1229.txt

On Wednesday, July 9, 2014 1:05:56 PM UTC-7, Kireet Reddy wrote:

The problem is happening again, this time on node 5. I have captured a few
hot thread requests here. I also included one from node 6 (which is now
fine).There are merge related stacks, but it seems like everything is
taking a lot more cpu than usual. I did a few type=wait and type=block
dumps and the result was always 0% usage there. Also young gen gc activity
has again gone crazy (old gen/heap size seems fine). Would hot thread
measurements be distorted if gc activity is very high?

It seems strange to me that this would only happen on one node while we
have replica set to at least 1 for all our indices. It seems like the
problems should happen on a couple nodes simultaneously.

--Kireet

On Monday, July 7, 2014 3:51:35 PM UTC-7, Michael McCandless wrote:

Could you pull all hot threads next time the problem happens?

Mike McCandless

http://blog.mikemccandless.com

On Mon, Jul 7, 2014 at 3:47 PM, Kireet Reddy kir...@feedly.com wrote:

All that seems correct (except I think this is for node 6, not node 5).
We don't delete documents, but we do some updates. The vast majority of
documents get indexed into the large shards, but the smaller ones take some
writes as well.

We aren't using virtualized hardware and elasticsearch is the only thing
running on the machines, no scheduled jobs, etc. I don't think something is
interfering, actually overall disk i/o rate and operations on the machine
go down quite a bit during the problematic period, which is consistent with
your observations about things taking longer.

I went back and checked all our collected metrics again. I noticed that
even though the heap usage and gc count seems smooth during the period in
question, gc time spent goes way up. Also active indexing threads goes up,
but since our ingest rate didn't go up I assumed this was a side effect.
During a previous occurrence a few days ago on node5, I stopped all
indexing activity for 15 minutes. Active merges and indexing requests went
to zero as expected. Then I re-enabled indexing and immediately the
increased cpu/gc/active merges went back up to the problematic rates.

Overall this is pretty confusing to me as to what is a symptom vs a root
cause here. A summary of what I think I know:

  1. Every few days, cpu usage on a node goes way above the other nodes
    and doesn't recover. We've let the node run in the elevated cpu state for a
    day with no improvement.
  2. It doesn't seem likely that it's data related. We use replicas=1
    and no other nodes have issues.
  3. It doesn't seem hardware related. We run on a dedicated h/w with
    elasticsearch being the only thing running. Also the problem appears on
    various nodes and machine load seems tied directly to the elasticsearch
    process.
  4. During the problematic period: cpu usage, active merge threads,
    active bulk (indexing) threads, and gc time are elevated.
  5. During the problematic period: i/o ops and i/o throughput decrease.
  6. overall heap usage size seems to smoothly increase, the extra gc
    time seems to be spent on the new gen. Interestingly, the gc count didn't
    seem to increase.
  7. In the hours beforehand, gc behavior of the problematic node was
    similar to the other nodes.
  8. If I pause indexing, machine load quickly returns to normal, merges
    and indexing requests complete. if I then restart indexing the problem
    reoccurs immediately.
  9. If I disable automatic refreshes, the problem disappears within an
    hour or so.
  10. hot threads show merging activity as the hot threads.

The first few points make me think the increased active merges is perhaps
a side effect, but then the last 3 make me think merging is the root cause.
The only additional things I can think of that may be relevant are:

  1. Our documents can vary greatly in size, they average a couple KB
    but can rarely be several MB.
  2. we do use language analysis plugins, perhaps one of these is acting
    up?
  3. We eagerly load one field into the field data cache. But the cache
    size is ok and the overall heap behavior is ok so I don't think this is the
    problem.

That's a lot of information, but I am not sure where to go next from
here...

On Monday, July 7, 2014 8:23:20 AM UTC-7, Michael McCandless wrote:

Indeed there are no big merges during that time ...

I can see on node5, ~14:45 suddenly merges are taking a long time, refresh
is taking much longer (4-5 seconds instead of < .4 sec), commit time goes
up from < 0.5 sec to ~1-2 sec, etc., but other metrics are fine e.g. total
merging GB, number of commits/refreshes is very low during this time.

Each node has 2 biggish (~17 GB) shards and then ~50 tiny shards. The
biggish shards are indexing at a very slow rate and only have ~1%
deletions. Are you explicitly deleting docs?

I suspect something is suddenly cutting into the IO perf of this box, and
because merging/refreshing is so IO intensive, it causes these operations
to run slower / backlog.

Are there any scheduled jobs, e.g. backups/snapshots, that start up? Are
you running on virtualized hardware?

Mike McCandless

http://blog.mikemccandless.com

On Sun, Jul 6, 2014 at 8:23 PM, Kireet Reddy kir...@feedly.com wrote:

Just to reiterate, the problematic period is from 07/05 14:45 to 07/06
02:10. I included a couple hours before and after in the logs.

On Sunday, July 6, 2014 5:17:06 PM UTC-7, Kireet Reddy wrote:

They are linked below (node5 is the log of the normal node, node6 is the
log of the problematic node).

I don't think it was doing big merges, otherwise during the high load
period, the merges graph line would have had a "floor" > 0, similar to the
time period after I disabled refresh. We don't do routing and use mostly
default settings. I think the only settings we changed are:

indices.memory.index_buffer_si*<*

...

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/91396be4-117c-45ea-bb9f-aa5cf95fa6bf%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

On Fri, Jul 11, 2014 at 7:35 PM, Kireet Reddy kireet@feedly.com wrote:

The problem reappeared. We did some tests today around copying a large
file on the nodes to test i/o throughput. On the loaded node, the copy was
really slow, maybe 30x slower. So it seems your suspicion around something
external interfering with I/O was right in the end even though nothing else
is running on the machines. We will investigate our setup further but this
doesn't seem like a lucene/elasticsearch issue in the end.

Hmm but ES was still running on the node? So it could still be something
about ES/Lucene that's putting heaving IO load on the box?

For the index close, I didn't issue any command, elasticsearch seemed to
do that on its own. The code is in IndexingMemoryController. The triggering
event seems to be the ram buffer size change, this triggers a call to
InternalEngine.updateIndexingBufferSize() which then calls flush with type
NEW_WRITER. That seems to close the lucene IndexWriter.

Ahh, thanks for the clarification, yes ES sometimes closes & opens a new
writer to make "non-live" settings changes take effect. However, changing
RAM buffer size for indexing is a live setting so it should not require the
close/open yet indeed (InternalEngine.updateIndexingBufferSize) it does ...
I'll dig.

Mike McCandless

http://blog.mikemccandless.com

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAD7smRdyxXGMvQZCVBMqgEWoM2ndDfKy1%3DkrV04F7S4fstQjqA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

We did the test with ES still running and indexing data, ES still
running/not indexing, and ES stopped. All three showed the poor i/o rate.
Then after a few minutes, the copy i/o rate somehow increased again. It was
really strange. We still have some digging to do to figure out the problem
there.

On Sunday, July 13, 2014 2:33:00 AM UTC-7, Michael McCandless wrote:

On Fri, Jul 11, 2014 at 7:35 PM, Kireet Reddy <kir...@feedly.com
<javascript:>> wrote:

The problem reappeared. We did some tests today around copying a large
file on the nodes to test i/o throughput. On the loaded node, the copy was
really slow, maybe 30x slower. So it seems your suspicion around something
external interfering with I/O was right in the end even though nothing else
is running on the machines. We will investigate our setup further but this
doesn't seem like a lucene/elasticsearch issue in the end.

Hmm but ES was still running on the node? So it could still be something
about ES/Lucene that's putting heaving IO load on the box?

For the index close, I didn't issue any command, elasticsearch seemed to
do that on its own. The code is in IndexingMemoryController. The triggering
event seems to be the ram buffer size change, this triggers a call to
InternalEngine.updateIndexingBufferSize() which then calls flush with type
NEW_WRITER. That seems to close the lucene IndexWriter.

Ahh, thanks for the clarification, yes ES sometimes closes & opens a new
writer to make "non-live" settings changes take effect. However, changing
RAM buffer size for indexing is a live setting so it should not require the
close/open yet indeed (InternalEngine.updateIndexingBufferSize) it does ...
I'll dig.

Mike McCandless

http://blog.mikemccandless.com

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d55f7ad6-ed9a-46de-9279-ce5d7863f80e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

On Mon, Jul 14, 2014 at 12:06 AM, Kireet Reddy kireet@feedly.com wrote:

We did the test with ES still running and indexing data, ES still
running/not indexing, and ES stopped. All three showed the poor i/o rate.
Then after a few minutes, the copy i/o rate somehow increased again. It was
really strange. We still have some digging to do to figure out the problem
there.

Whoa, OK. That's really spooky: boxes are not supposed to behave this
way (get tired, like humans, after running too hard / too long). Hopefully
you get to the bottom of it...

Mike McCandless

http://blog.mikemccandless.com

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAD7smRcQYSc7%2BeF%2BiLrU_0FMWciQ4JHihedH12zB%2BADXLz9RSg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

I/O issues can indicate underlying hardware issues. If you are running RAID
then check your controller status including the battery and the array and
individual disk status.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 14 July 2014 19:40, Michael McCandless mike@elasticsearch.com wrote:

On Mon, Jul 14, 2014 at 12:06 AM, Kireet Reddy kireet@feedly.com wrote:

We did the test with ES still running and indexing data, ES still
running/not indexing, and ES stopped. All three showed the poor i/o rate.
Then after a few minutes, the copy i/o rate somehow increased again. It was
really strange. We still have some digging to do to figure out the problem
there.

Whoa, OK. That's really spooky: boxes are not supposed to behave this
way (get tired, like humans, after running too hard / too long). Hopefully
you get to the bottom of it...

Mike McCandless

http://blog.mikemccandless.com

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAD7smRcQYSc7%2BeF%2BiLrU_0FMWciQ4JHihedH12zB%2BADXLz9RSg%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAD7smRcQYSc7%2BeF%2BiLrU_0FMWciQ4JHihedH12zB%2BADXLz9RSg%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624bohZ8kp-PMU6qTkQHpgSjay22xjQJjZx6E6dN-SW-nsQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

On Sun, Jul 13, 2014 at 5:32 AM, Michael McCandless mike@elasticsearch.com
wrote:

For the index close, I didn't issue any command, elasticsearch seemed to

do that on its own. The code is in IndexingMemoryController. The triggering
event seems to be the ram buffer size change, this triggers a call to
InternalEngine.updateIndexingBufferSize() which then calls flush with type
NEW_WRITER. That seems to close the lucene IndexWriter.

Ahh, thanks for the clarification, yes ES sometimes closes & opens a new
writer to make "non-live" settings changes take effect. However, changing
RAM buffer size for indexing is a live setting so it should not require the
close/open yet indeed (InternalEngine.updateIndexingBufferSize) it does ...
I'll dig.

OK, this is now fixed in Elasticsearch (will be in 1.3.0) ... see
Don't close/reopen IndexWriter when changing RAM buffer size by mikemccand · Pull Request #6856 · elastic/elasticsearch · GitHub so now when the
indexing RAM buffer is updated it won't close & reopen the IndexWriter.

Mike McCandless

http://blog.mikemccandless.com

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAD7smRcROO5LSxP5CTzv6tT0_TKAVKXfa10bPKaQU-6JRLM3AQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.