Index/shard API Unresponsive

Hi,
We have an old cluster running at version 1.3.2 with 6 data nodes and 2 masters. Aim, the cluster is red with two shards in an initialising state and two unassigned:

{
  "cluster_name" : "elasticsearch",
  "status" : "red",
  "timed_out" : false,
  "number_of_nodes" : 8,
  "number_of_data_nodes" : 6,
  "active_primary_shards" : 71,
  "active_shards" : 140,
  "relocating_shards" : 0,
  "initializing_shards" : 2,
  "unassigned_shards" : 2
}

However, I'm unable to determine which indices/shards have a problem because the cat/indices and cat/shards hang indefinitely. Can anyone advise on why this might be happening and how I can tackle it?

Regards,
David

Can you attach the ES logs?

Hi,
Here's what we're getting this morning:

[2017-04-05 04:30:49,460][WARN ][index.merge.scheduler    ] [node06] [index-2017.04.05][2] failed to merge
java.io.IOException: Input/output error: NIOFSIndexInput(path="/apps/elasticsearch/disk3/elasticsearch/nodes/0/indices/index-2017.04.05/2/index/_q8_es090_0.doc")
        at org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:186)
        at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:342)
        at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:54)
        at org.apache.lucene.store.DataInput.readVInt(DataInput.java:126)
        at org.apache.lucene.store.BufferedIndexInput.readVInt(BufferedIndexInput.java:221)
        at org.apache.lucene.codecs.lucene41.Lucene41PostingsReader.readVIntBlock(Lucene41PostingsReader.java:126)
        at org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsAndPositionsEnum.refillDocs(Lucene41PostingsReader.java:696)
        at org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsAndPositionsEnum.nextDoc(Lucene41PostingsReader.java:752)
        at org.apache.lucene.codecs.MappingMultiDocsAndPositionsEnum.nextDoc(MappingMultiDocsAndPositionsEnum.java:104)
        at org.apache.lucene.codecs.PostingsConsumer.merge(PostingsConsumer.java:109)
        at org.apache.lucene.codecs.TermsConsumer.merge(TermsConsumer.java:164)
        at org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:72)
        at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:399)
        at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:112)
        at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4163)
        at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3759)
        at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
        at org.apache.lucene.index.TrackingConcurrentMergeScheduler.doMerge(TrackingConcurrentMergeScheduler.java:106)
        at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)
Caused by: java.io.IOException: Input/output error
        at sun.nio.ch.FileDispatcherImpl.pread0(Native Method)
        at sun.nio.ch.FileDispatcherImpl.pread(FileDispatcherImpl.java:52)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:220)
        at sun.nio.ch.IOUtil.read(IOUtil.java:197)
        at sun.nio.ch.FileChannelImpl.readInternal(FileChannelImpl.java:699)
        at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:684)
        at org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:176)
        ... 18 more
[2017-04-05 04:30:49,461][WARN ][index.engine.internal    ] [node06] [index-2017.04.05][2] failed engine [merge exception]
[2017-04-05 04:30:49,970][WARN ][cluster.action.shard     ] [node06] [index-2017.04.05][2] sending failed shard for [index-2017.04.05][2], node[K8i-aoFATluUWFunOB671w], [P], s[STARTED], indexUUID [5FNalKcQR2K5F9AeNOB0Ew], reason [engine failure, message [merge exception][MergeException[java.io.IOException: Input/output error: NIOFSIndexInput(path="/apps/elasticsearch/disk3/elasticsearch/nodes/0/indices/index-2017.04.05/2/index/_q8_es090_0.doc")]; nested: IOException[Input/output error: NIOFSIndexInput(path="/apps/elasticsearch/disk3/elasticsearch/nodes/0/indices/index-2017.04.05/2/index/_q8_es090_0.doc")]; nested: IOException[Input/output error]; ]]

Regards,
David

On node06, can you find out the disk usage?

Linux

df -h

and

df -h /apps/elasticsearch/disk3/elasticsearch/nodes/0/indices/index-2017.04.05/2/index/_q8_es090_0.doc

and also provide the iostats / sar (preferably) if you have it.

df -h:
/dev/mapper/vg-root                           16G  1.9G   14G  13% /
udev                                          16G  4.0K   16G   1% /dev
tmpfs                                        6.3G  300K  6.3G   1% /run
none                                         5.0M     0  5.0M   0% /run/lock
none                                          16G     0   16G   0% /run/shm
overflow                                     1.0M  356K  668K  35% /tmp
/dev/mapper/vgsdb-lvsdb                      3.6T  1.2T  2.3T  35% /elasticsearch/disk2
/dev/mapper/vgsdc-lvsdc                      3.6T  1.2T  2.3T  35% /elasticsearch/disk3
/dev/mapper/vgsdd-lvsdd                      3.6T  1.3T  2.2T  36% /elasticsearch/disk4
/dev/mapper/vg-lvsda                         886G  567M  840G   1% /elasticsearch/disk1
/dev/sda1                                    198M   41M  147M  22% /boot

The other file you refer to is not present. I don't have access to historic instat / sar atm...

Regards,
David

I was hoping it was a disk space issue. From your df output it is not. Then with the sar i would be able to trace back to the historical info. Without it , it would be hard to say what was wrong with the I/O. Is disk 3 a NFS mount? Can you attach the dmesg file for the timeframe this problem occurred?

There was a disk space issue. The root volume filled, though not completely cos es could still write its log file. I'll look into the dmesg output...

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.