Elasticsearch 2.4.0 crashing during heavy bulk index loads

Hi all,

I am fairly new to Elasticsearch and am very impressed thus far. However, I am running into a problem as I test it to insure it is something I could run in a production environment. I'm hopeful that someone here might have some insight as to my issue.

I'm running ES 2.4 under Linux (64bit) Mint 18 (4 CPUs) using the Oracle 8 JDK (1.8.0_60). I have also tried, with the same results, using version 1.8.0_101. I'm using the default ES settings except that I am allocating 4Gb and locking that memory as per the documented guidelines. For now, I'm running 1 node with 4 shards.

The issue I'm seeing is that, when I try to index large volumes of documents (ie: 8 threads inserting 100 documents at a time using the bulk api) through the Java client, I start seeing messages about the index being corrupted and the JVM crashes, thus bringing down ES. Here are some of the logging details I see when this occurs:

[2016-09-12 10:44:57,386][ERROR][index.engine ] [n1] [index1][0] failed to merge
org.apache.lucene.index.CorruptIndexException: checksum failed (hardware problem?) : expected=5633a4a2 actual=6fae7389 (resource=BufferedChecks
umIndexInput(NIOFSIndexInput(path="/home/elastic/index1/nodes/0/indices/index1/0/index/_45v.cfs") [slice=_45v_Lucene54_0.dvd]))
at org.apache.lucene.codecs.CodecUtil.checkFooter(CodecUtil.java:334)
at org.apache.lucene.codecs.CodecUtil.checksumEntireFile(CodecUtil.java:451)
at org.apache.lucene.codecs.lucene54.Lucene54DocValuesProducer.checkIntegrity(Lucene54DocValuesProducer.java:475)
at org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsReader.checkIntegrity(PerFieldDocValuesFormat.java:341)
at org.apache.lucene.codecs.DocValuesConsumer.merge(DocValuesConsumer.java:135)
at org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:150)
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:105)
at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4075)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3655)
at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:588)
at org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:94)
at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:626)
[2016-09-12 10:44:57,417][WARN ][index.engine ] [n1] [index1][0] failed engine [merge failed]
org.apache.lucene.index.MergePolicy$MergeException: org.apache.lucene.index.CorruptIndexException: checksum failed (hardware problem?) : expect
ed=5633a4a2 actual=6fae7389 (resource=BufferedChecksumIndexInput(NIOFSIndexInput(path="/home/elastic/index1/nodes/0/indices/index1/0/index/_45v
.cfs") [slice=_45v_Lucene54_0.dvd]))
at org.elasticsearch.index.engine.InternalEngine$EngineMergeScheduler$1.doRun(InternalEngine.java:1237)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

As you can see, the logs indicate possible checksum errors on the drive. However, I'm able to run extremely intensive Java (using NIO) apps using this drive that push it far beyond what ES is but I never see this issue in any other application. Also, the drive is only at 30% capacity at this point. Note that when I ran with a smaller load (only 4 publishers) all was fine. I was able to run for days without issue. When I increased that to 8, it took about 20 minutes before this issue occurred. The only correlation I can make in regards to the number producers for my test app is that I do have 4 cpus on my test system.

So I have a couple of questions:

1.) Is there a way to recover (or at-least partially recover) the index when this occurs? Right now I'm just looking
for a way to get as much data back as I can so I don't have to re-insert everything.

2.) Has anyone seen this before and, if so, how I can I avoid this?

Thank You

I've seen something that "might" be similar. Cluster appeared stable until we increased indexing volume and then the JVM would crash. We concluded it was a bug in the JVM related to the specific type of CPU being used. I believe we went back to an older version of the JVM which worked until we moved the cluster to newer HW. This is definitely not normal behavior for ES.

I saw similar behavior after upgrading from 1.7 -> 2.4 and initiating a fairly heavy indexing process.

Ours is a 4-data node configuration running on pretty heavy duty hardware: xeon E5-2630 v3 @ 2.4GHz, two raid 0 SSD arrays on each for Elastic data w/ HDD running the OS, which is RHEL 7.1. We had run similar (if not heavier) indexing loads under 1.7.

I have a sync'ing routing that I re-ran over the range of records I was indexing at the time to ensure full coverage. For our purposes if we're missing a handful of records it's not the end of the world. But the above error was disconcerting for us.

The other change of note we made is going from running Elastic native to running it under Docker. We didn't mount it as a volume, but rather a directory mount since we were picking up pre-existing data.

This is what I was seeing as well with 2.3.4. The JVM would leave stack trace info and from what I could tell, it looked like it always crashed in the garbage collector.

However, I upgraded to 2.4.0 a few days ago and now I'm not seeing those, but I still get corrupted indexes.

I'm most likely pushing the index loads more than they would be for a single connection in a production environment, but I'm concerned because there will certainly be a number of data producers inserting into the index at once.

This is pretty much a show-stopper for me. I'd hate to have to go back to a pre-2.0 release given what functionality has been added since then.

What FS is this on?

For mine it's ext4

/dev/sdc1 on /mnt/sdc1 type ext4 (rw,relatime,seclabel,data=ordered)
/dev/sdb1 on /mnt/sdb1 type ext4 (rw,relatime,seclabel,data=ordered)

Same here, ext4.

/dev/sdb1 on /home type ext4 (rw,relatime,data=ordered)

The error can occur

  • if there is a very old (and previously corrupted) ES 1.x index
  • if the index files are mounted over the network (NFS and such), or can not be fsynced
  • if JVM or Linux file system internals fail

The index you use, is that an ES 1.x index migrated to ES 2.4, or a native ES 2.4 index?

Is anything in the kernel log? /var/log/messages or so?

Can you check if the system is updated to Ubuntu 16.04.1?

It is peculiar that your ES is using NIOFSIndexInput when you mentioned that memory is mapped (locked), in that case, it should state MMapIndexInput instead.

Hi Jörg,

I'll try to answer these as best I can. I actually just ran another test and was able to replicate it. It doesn't happen at the same place (number of docs, etc.), but happens 100% when I am creating indices to do some testing.

1.) I'm only using 2.4 in this case and creating the indices from scratch through a Java application that inserts many documents using the block API. No previous index versions are being utilized (though this also happened with 2.3.4 and 2.3.5).

2.) These are mounted on a local file-system using ext4 as (rw,relatime,data=ordered) on a 500Gb IDE drive and there is only 31% of the drive used even at the point of failure.

3.) Can't say much about possible failures in the file-system / os itself. Nothing shows up in dmesg or the syslog. I've tried two versions of java, 1.8.0_60 and 1.8.0_111. Both of which are the 64 bit Oracle JDK implementation.

I'm running on Linux Mint 18 with the following kernel information:

Linux 4.4.0-36-generic #55-Ubuntu SMP Thu Aug 11 18:01:55 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

This is on a 3.2 Ghz 4 core system with 16 Gb of ram and I keep everything very current in terms of updates, etc.

As you can see from the following screen-dump from Marvel, in this case I was able to insert just over 20 million docs (these are small, no full-text fields), before the issue occurred. There wasn't really a spike in the number of documents indexed / second when the server shut down the shards due to corruption. The index that I'm pushing data into is configured as 4 shards and no replicas.

Here are the settings I have overridden in the elasticsearch.yml file.
cluster.name: c1
node.name: n1
path.data: /home/elastic
path.logs: /var/elasticsearch/logs
bootstrap.memory_lock: true
node.max_local_storage_nodes: 1

Then in my elasticsearch.in.sh script, I'm setting ES_HEAP_SIZE=4g. This is on a 16gb system. I know I could
go to 8gb but this is my desktop so I'm leaving a bit for other things.

I've bumped up the memlock, nofile, and core limits to unlimited in the /etc/security/limits.conf file.
(Note that I do not see the error message in the log stating that it was unable to lock memory due to these not being set but I did the first time I ran it, so as you say, it should be using MMAP and not NIO).

And now the log that was generated for this latest occurrence (matches the Marvel image I uploaded):

[2016-09-14 13:58:07,223][INFO ][cluster.routing.allocation] n1] Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[.marvel-es-1-2016.09.14][0], [c1][2], [.marvel-es-data-1][0], [.marvel-es-1-2016.09.14][0], [c1][2], [.marvel-es-data-1][0]] ...]).

[2016-09-14 14:45:12,205][ERROR][index.engine ] [n1] [c1][2] failed to merge
org.apache.lucene.index.CorruptIndexException: checksum failed (hardware problem?) : expected=78266706 actual=8ecfdc5c (resource=BufferedChecksumIndexInput(NIOFSIndexInput(path="/home/elastic/c1/nodes/0/indices/c1/2/index/_3lm.cfs") [slice=_3lm.fdt]))
at org.apache.lucene.codecs.CodecUtil.checkFooter(CodecUtil.java:334)
at org.apache.lucene.codecs.CodecUtil.checksumEntireFile(CodecUtil.java:451)
at org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.checkIntegrity(CompressingStoredFieldsReader.java:669)
at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.merge(CompressingStoredFieldsWriter.java:534)
at org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:177)
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:83)
at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4075)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3655)
at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:588)
at org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:94)
at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:626)
[2016-09-14 14:45:12,346][WARN ][index.engine ] [n1] [c1][2] failed engine [corrupt file (source: [create])]
org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed
at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:720)
at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:734)
at org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1314)
at org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1297)
at org.elasticsearch.index.engine.InternalEngine.innerCreateNoLock(InternalEngine.java:432)
at org.elasticsearch.index.engine.InternalEngine.innerCreate(InternalEngine.java:375)
at org.elasticsearch.index.engine.InternalEngine.create(InternalEngine.java:346)
at org.elasticsearch.index.shard.IndexShard.create(IndexShard.java:545)
at org.elasticsearch.index.engine.Engine$Create.execute(Engine.java:810)
at org.elasticsearch.action.index.TransportIndexAction.executeIndexRequestOnPrimary(TransportIndexAction.java:236)
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:327)
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:120)
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:68)
at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryPhase.doRun(TransportReplicationAction.java:648)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:279)
at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:271)
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:77)
at org.elasticsearch.transport.TransportService$4.doRun(TransportService.java:376)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

There is more, but it just repeats the above. Something that stands out to me is that, even though I'm configured to use memmap it is still saying it is using NIO instead. Also, just to be on the safe side, I did unmount this drive and run a chckdsk on it with bad block testing and there were no errors.

Thanks for any help anyone can provide on this issue.

All your information looks very sane to me:

  • latest Linux kernel
  • local disk storage
  • single node mode
  • light load

The error message is related to Lucene directories, disk I/O, and merging segments, which can heavily utilize the underlying operating system and the hardware. So my first reaction is to find effects from outside ES.

ES decides to use niofs index store automatically. With bootstrap.memory_lock alone, you can not switch to mmapfs. There is a "hybrid mode" of both niofs/mmapfs for Linux, see https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules-store.html

But segment merging should work anyway, in all cases, whether it's mmapfs or not.

Maybe experimenting with another index store helps. This has a performance impact.

Clean error reproduction or finding a triggering condition in this area is almost impossible due to the inherent parallel nature of segment merging.

So I'm lost. Maybe the settings path.data and node.max_local_storage_nodes:1 do not work as expected. Maybe some other process is writing or deleting data in the index data directory - then it wouldn't be a bug at all. Maybe ...

I faced similar exception with 2.4.0 Elasticsearch on z/OS

The funny thing is, I've tried this with multiple versions 2.3, 2.4, and 2.5 with the same results. I'm now trying out 5.0 (once I figure out why the PreBuiltTransportClient Java client method is missing from the distribution) to see if it has the same problems.

Lately, I've been considering the possibility of using the hadoop based version, but most of the example configurations I've seen seem to be centered more around the use of hadoop and it's tools with elasticsearch sort of "along for the ride" so to speak.

Since the "crash" always seems to occur somewhere in Lucene, I thought that maybe using HDFS as the back-end instead might get me past my problem. At this point, ES is essentially useless to me unless I can get this resolved. I love it's performance and flexibility, but if it's going to upchuck like this with just 5 million or so fairly small JSON documents, that's a show-stopper.

Has anyone here tried using the hadoop version and were there similar issues with it before I go down that rabbit hole?

Thanks

The hadoop version?

Sorry, wasn't clear on that. I'm talking about "elasticsearch-hadoop". I'd rather not go that option since I really don't need all of the other functionality that hadoop brings with it, but I'm running out of options.

Thanks

es-hadoop is a connector library between the two, it's not a replacement for either.

Are you able to provide a replicable test that shows this problem? If so we may have a better chance of finding the problem.

Hi Mark,

Admittedly, I'm not very familiar with es-hadoop. It was my understanding from reading through some of the docs and the setup, etc. that it was elasticsearch that can use hadoop as it's back-end storage (rather than Lucene for example). If that's not the case then obviously it wouldn't help much.

Once I get the client code changed to work with 5.0 (mainly dependencies) I'll see if I can put together a simple test and I'll try to submit everything pertinent at that point.

Thanks

It is not.