Elastic search 5.3.1 seems to cause some corruption, has anyone faced it


(Ninad Pradhan) #1

Describe the feature:

5.3.1:

Plugins installed: []

JVM version (java -version):

java version "1.8.0_131"
Java(TM) SE Runtime Environment (build 1.8.0_131-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)

OS version (uname -a if on a Unix-like system):

Linux vps-cl-0 4.2.0-42-generic #49~14.04.1-Ubuntu SMP Wed Jun 29 20:22:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

Description of the problem including expected versus actual behavior:
Elastic search seems to have some corruption... and doesnt allow you to start the cluster again once it goes in this state.... we rebooted even the machines.

Caused by: org.apache.lucene.store.LockObtainFailedException: Lock held by this virtual machine: /data/nodes/0/indices/b245vxRiSk6wO_XMgEvf2A/1/index/write.lock

Steps to reproduce:

create a 3 node cluster and push around 80GB of data

use this config elasticsearch.yml ....

cluster.name: icecluster
node.name: icecluster-10.10.20.7
node.master: true
node.data: true
path.data: /data
http.port: 9200
network.bind_host: '0.0.0.0'
network.publish_host: 10.10.20.7
discovery.zen.ping.unicast.hosts: [10.10.20.7,10.10.20.8,10.10.20.9]
discovery.zen.minimum_master_nodes: 2
discovery.zen.no_master_block: write
thread_pool:
  bulk:
    queue_size: 1000000
  index:
    queue_size: 1000000

Jvm 4 gb xms and 4 gb xmx
Provide logs (if relevant):

   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_131]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
Caused by: org.elasticsearch.transport.RemoteTransportException: [icecluster-10.10.20.7][10.10.20.7:9300][internal:cluster/nodes/indices/shard/store[n]]
Caused by: org.elasticsearch.ElasticsearchException: Failed to list store metadata for shard [[mldata_06-25-2017][1]]
        at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:114) ~[elasticsearch-5.3.1.jar:5.3.1]
        at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:64) ~[elasticsearch-5.3.1.jar:5.3.1]
        at org.elasticsearch.action.support.nodes.TransportNodesAction.nodeOperation(TransportNodesAction.java:145) ~[elasticsearch-5.3.1.jar:5.3.1]
        at org.elasticsearch.action.support.nodes.TransportNodesAction$NodeTransportHandler.messageReceived(TransportNodesAction.java:269) ~[elasticsearch-5.3.1.jar:5.3.1]
        at org.elasticsearch.action.support.nodes.TransportNodesAction$NodeTransportHandler.messageReceived(TransportNodesAction.java:265) ~[elasticsearch-5.3.1.jar:5.3.1]
        at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:69) ~[elasticsearch-5.3.1.jar:5.3.1]
        at org.elasticsearch.transport.TransportService$7.doRun(TransportService.java:618) ~[elasticsearch-5.3.1.jar:5.3.1]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:638) ~[elasticsearch-5.3.1.jar:5.3.1]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-5.3.1.jar:5.3.1]
        ... 3 more
Caused by: org.apache.lucene.store.LockObtainFailedException: Lock held by this virtual machine: /data/nodes/0/indices/b245vxRiSk6wO_XMgEvf2A/1/index/write.lock
        at org.apache.lucene.store.NativeFSLockFactory.obtainFSLock(NativeFSLockFactory.java:127) ~[lucene-core-6.4.2.jar:6.4.2 34a975ca3d4bd7fa121340e5bcbf165929e0542f - ishan - 2017-03-01 23:23:13]
        at org.apache.lucene.store.FSLockFactory.obtainLock(FSLockFactory.java:41) ~[lucene-core-6.4.2.jar:6.4.2 34a975ca3d4bd7fa121340e5bcbf165929e0542f - ishan - 2017-03-01 23:23:13]

(Mark Walkom) #2

What filesystem are you using?


(Ninad Pradhan) #3

Its mention above

OS version (uname -a if on a Unix-like system):
Linux vps-cl-0 4.2.0-42-generic #49~14.04.1-Ubuntu SMP Wed Jun 29 20:22:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

but here are more details...

NAME FSTYPE LABEL MOUNTPOINT
sda
├─sda1 ext4 /
├─sda2
└─sda5 swap [SWAP]
sdb ext3
└─sdb1 ext4 /data


(David Pilato) #4

Please format your code using </> icon as explained in this guide. It will make your post more readable.

Or use markdown style like:

```
CODE
```

I edited your post.


(David Pilato) #5

Not related to your issue I think but I'm not sure why you really want to do this:

thread_pool:
  bulk:
    queue_size: 1000000
  index:
    queue_size: 1000000

(Ninad Pradhan) #6

Its possible that one of our qa engineers wiped out half of data... so this is not a real issue I feel. Closing the issue down.


(system) closed #7

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.