Index/shard API Unresponsive

dawiro · April 4, 2017, 9:18am

Hi,
We have an old cluster running at version 1.3.2 with 6 data nodes and 2 masters. Aim, the cluster is red with two shards in an initialising state and two unassigned:

{
  "cluster_name" : "elasticsearch",
  "status" : "red",
  "timed_out" : false,
  "number_of_nodes" : 8,
  "number_of_data_nodes" : 6,
  "active_primary_shards" : 71,
  "active_shards" : 140,
  "relocating_shards" : 0,
  "initializing_shards" : 2,
  "unassigned_shards" : 2
}

However, I'm unable to determine which indices/shards have a problem because the cat/indices and cat/shards hang indefinitely. Can anyone advise on why this might be happening and how I can tackle it?

Regards,
David

jkuang · April 4, 2017, 9:23pm

Can you attach the ES logs?

dawiro · April 5, 2017, 7:05am

Hi,
Here's what we're getting this morning:

[2017-04-05 04:30:49,460][WARN ][index.merge.scheduler    ] [node06] [index-2017.04.05][2] failed to merge
java.io.IOException: Input/output error: NIOFSIndexInput(path="/apps/elasticsearch/disk3/elasticsearch/nodes/0/indices/index-2017.04.05/2/index/_q8_es090_0.doc")
        at org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:186)
        at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:342)
        at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:54)
        at org.apache.lucene.store.DataInput.readVInt(DataInput.java:126)
        at org.apache.lucene.store.BufferedIndexInput.readVInt(BufferedIndexInput.java:221)
        at org.apache.lucene.codecs.lucene41.Lucene41PostingsReader.readVIntBlock(Lucene41PostingsReader.java:126)
        at org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsAndPositionsEnum.refillDocs(Lucene41PostingsReader.java:696)
        at org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsAndPositionsEnum.nextDoc(Lucene41PostingsReader.java:752)
        at org.apache.lucene.codecs.MappingMultiDocsAndPositionsEnum.nextDoc(MappingMultiDocsAndPositionsEnum.java:104)
        at org.apache.lucene.codecs.PostingsConsumer.merge(PostingsConsumer.java:109)
        at org.apache.lucene.codecs.TermsConsumer.merge(TermsConsumer.java:164)
        at org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:72)
        at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:399)
        at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:112)
        at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4163)
        at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3759)
        at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
        at org.apache.lucene.index.TrackingConcurrentMergeScheduler.doMerge(TrackingConcurrentMergeScheduler.java:106)
        at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)
Caused by: java.io.IOException: Input/output error
        at sun.nio.ch.FileDispatcherImpl.pread0(Native Method)
        at sun.nio.ch.FileDispatcherImpl.pread(FileDispatcherImpl.java:52)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:220)
        at sun.nio.ch.IOUtil.read(IOUtil.java:197)
        at sun.nio.ch.FileChannelImpl.readInternal(FileChannelImpl.java:699)
        at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:684)
        at org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:176)
        ... 18 more
[2017-04-05 04:30:49,461][WARN ][index.engine.internal    ] [node06] [index-2017.04.05][2] failed engine [merge exception]
[2017-04-05 04:30:49,970][WARN ][cluster.action.shard     ] [node06] [index-2017.04.05][2] sending failed shard for [index-2017.04.05][2], node[K8i-aoFATluUWFunOB671w], [P], s[STARTED], indexUUID [5FNalKcQR2K5F9AeNOB0Ew], reason [engine failure, message [merge exception][MergeException[java.io.IOException: Input/output error: NIOFSIndexInput(path="/apps/elasticsearch/disk3/elasticsearch/nodes/0/indices/index-2017.04.05/2/index/_q8_es090_0.doc")]; nested: IOException[Input/output error: NIOFSIndexInput(path="/apps/elasticsearch/disk3/elasticsearch/nodes/0/indices/index-2017.04.05/2/index/_q8_es090_0.doc")]; nested: IOException[Input/output error]; ]]

Regards,
David

jkuang · April 5, 2017, 4:27pm

On node06, can you find out the disk usage?

Linux

df -h

and

df -h /apps/elasticsearch/disk3/elasticsearch/nodes/0/indices/index-2017.04.05/2/index/_q8_es090_0.doc

and also provide the iostats / sar (preferably) if you have it.

dawiro · April 5, 2017, 5:10pm

df -h:
/dev/mapper/vg-root                           16G  1.9G   14G  13% /
udev                                          16G  4.0K   16G   1% /dev
tmpfs                                        6.3G  300K  6.3G   1% /run
none                                         5.0M     0  5.0M   0% /run/lock
none                                          16G     0   16G   0% /run/shm
overflow                                     1.0M  356K  668K  35% /tmp
/dev/mapper/vgsdb-lvsdb                      3.6T  1.2T  2.3T  35% /elasticsearch/disk2
/dev/mapper/vgsdc-lvsdc                      3.6T  1.2T  2.3T  35% /elasticsearch/disk3
/dev/mapper/vgsdd-lvsdd                      3.6T  1.3T  2.2T  36% /elasticsearch/disk4
/dev/mapper/vg-lvsda                         886G  567M  840G   1% /elasticsearch/disk1
/dev/sda1                                    198M   41M  147M  22% /boot

The other file you refer to is not present. I don't have access to historic instat / sar atm...

Regards,
David

jkuang · April 5, 2017, 9:37pm

I was hoping it was a disk space issue. From your df output it is not. Then with the sar i would be able to trace back to the historical info. Without it , it would be hard to say what was wrong with the I/O. Is disk 3 a NFS mount? Can you attach the dmesg file for the timeframe this problem occurred?

dawiro · April 6, 2017, 8:44am

There was a disk space issue. The root volume filled, though not completely cos es could still write its log file. I'll look into the dmesg output...

system · May 4, 2017, 8:44am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Cluster red, unassigned shards, no response on writes Elasticsearch	17	691	February 22, 2023
Cluster becomes unresponsive and cluster health becomes yellow Elasticsearch	5	1092	July 5, 2017
Elasticsearch is in red state: class org.apache.lucene.store.BufferedChecksumIndexInput cannot seek backwards Elasticsearch	1	1116	July 26, 2017
Elasticsearch becomes unresponsive during Lucene merges after bulk indexing Elasticsearch	1	1364	July 5, 2017
1 index having two shards went unassigned, when we do the cluster explain we received below response. Request you to assist on this issue Elasticsearch	1	150	November 7, 2023

Index/shard API Unresponsive

Related topics