ElasticSearch 2.2.0 - File Too Large while bulk indexing

cjuste · March 8, 2016, 11:07am

Hello,

I'm using ES 2.2.0 in a cluster of 5 nodes. It's a new cluster to replace the old one and I want to index everything I have, so I'm indexing a lot of documents at the time with the HTTP Bulk API (around 5k documents/4MB per bulk).
I'm rounding URL to send to a new node for each file.

I get this error with only a few GB already sent (4Gb):

[2016-03-08 10:52:19,364][ERROR][index.engine             ] [Fagin] [v00000262][4] failed to merge
java.io.IOException: File too large
	at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
	at sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:60)
	at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
	at sun.nio.ch.IOUtil.write(IOUtil.java:65)
	at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:205)
	at java.nio.channels.Channels.writeFullyImpl(Channels.java:78)
	at java.nio.channels.Channels.writeFully(Channels.java:101)
	at java.nio.channels.Channels.access$000(Channels.java:61)
	at java.nio.channels.Channels$1.write(Channels.java:174)
	at org.apache.lucene.store.FSDirectory$FSIndexOutput$1.write(FSDirectory.java:282)
	at java.util.zip.CheckedOutputStream.write(CheckedOutputStream.java:73)
	at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
	at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
	at org.apache.lucene.store.OutputStreamIndexOutput.writeBytes(OutputStreamIndexOutput.java:53)
	at org.apache.lucene.store.RateLimitedIndexOutput.writeBytes(RateLimitedIndexOutput.java:73)
	at org.apache.lucene.codecs.compressing.LZ4.encodeLiterals(LZ4.java:157)
	at org.apache.lucene.codecs.compressing.LZ4.encodeSequence(LZ4.java:170)
	at org.apache.lucene.codecs.compressing.LZ4.compress(LZ4.java:243)
	at org.apache.lucene.codecs.compressing.CompressionMode$LZ4FastCompressor.compress(CompressionMode.java:164)
	at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.flush(CompressingStoredFieldsWriter.java:236)
	at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.finishDocument(CompressingStoredFieldsWriter.java:163)
	at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.merge(CompressingStoredFieldsWriter.java:605)
	at org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:177)
	at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:83)
	at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4075)
	at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3655)
	at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:588)
	at org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:94)
	at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:626)
	Suppressed: java.io.IOException: File too large
		at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
		at sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:60)
		at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
		at sun.nio.ch.IOUtil.write(IOUtil.java:65)
		at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:205)
		at java.nio.channels.Channels.writeFullyImpl(Channels.java:78)
		at java.nio.channels.Channels.writeFully(Channels.java:101)
		at java.nio.channels.Channels.access$000(Channels.java:61)
		at java.nio.channels.Channels$1.write(Channels.java:174)
		at org.apache.lucene.store.FSDirectory$FSIndexOutput$1.write(FSDirectory.java:282)
		at java.util.zip.CheckedOutputStream.write(CheckedOutputStream.java:73)
		at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
		at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
		at org.apache.lucene.store.OutputStreamIndexOutput.close(OutputStreamIndexOutput.java:68)
		at org.apache.lucene.store.RateLimitedIndexOutput.close(RateLimitedIndexOutput.java:49)
		at org.apache.lucene.util.IOUtils.close(IOUtils.java:97)
		at org.apache.lucene.util.IOUtils.close(IOUtils.java:84)
		at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.close(CompressingStoredFieldsWriter.java:138)
		at org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:178)
		... 6 more

I've modified the config to reduce fielddata with:

indices.breaker.fielddata.limit: 85%
indices.fielddata.cache.size: 75%

The HEAP_SIZE is 2GB.
I've put replicas to 0 for the bulk.
I've looked at some shards of the index v00000262. The biggest file I find is 50MB only...

jprante · March 8, 2016, 12:34pm

Check if a file size limit is set, with ulimit -a

cjuste · March 8, 2016, 2:14pm

Thanks, it was indeed the case

Topic		Replies	Views
Elasticsearch : cannot bulk index file larger than 6mb Elasticsearch	1	590	August 21, 2017
Attempt to index a large dataset fails Elasticsearch	12	530	July 6, 2017
Indexing large number of files each with a huge size Elasticsearch	3	486	July 6, 2017
Bulk API does not work for huge data! Elasticsearch	4	1165	July 6, 2017
[elastic/elasticsearch] Cannot bulk index a JSON file greater than 100MB in Elasticsearch. Tried changing HTTP content length but it doesn't work Elasticsearch	14	2230	September 18, 2018

ElasticSearch 2.2.0 - File Too Large while bulk indexing

Related topics