Hi,
I'm trying to deploy an ELK stack on docker swarm.
If I bind the elastic dir data to a Docker volume there is no problem.
The problems comes as soon as I try to bind the elstastic data dir to a glusterFS volume.
I use glusterFS to synchronise the data between all the swarm nodes in the cluster.
I deploy ELK using the following code:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:6.2.3
# container_name: elasticsearch
environment:
- "http.host=0.0.0.0"
- "transport.host=127.0.0.1"
- "ELASTIC_PASSWORD=changeme"
- "TAKE_FILE_OWNERSHIP=1"
ports: ['127.0.0.1:9200:9200']
volumes:
- /opt/dockershared/stack-elk/elk:/usr/share/elasticsearch/data
networks: ['stack']
The dir '/opt/dockershared/' is a glusterFS volume:
myhost:/gvol0 on /opt/dockershared type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072,_netdev)
The ELK stack starts without problems, but after 30/60 minutes the allocation of the shards fails.
In the ELK logs I see the following exception:
[2018-04-13T08:58:16,749][WARN ][o.e.i.e.Engine ] [MPxFOvC] [metricbeat-6.2.3-2018.04.13][0] failed engine [refresh failed source[schedule]]
org.apache.lucene.index.CorruptIndexException: Problem reading index from store(MMapDirectory@/usr/share/elasticsearch/data/nodes/0/indices/fRcersH4RjecZ8AKb3WZTQ/0/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@73620ce7) (resource=store(MMapDirectory@/usr/share/elasticsearch/data/nodes/0/indices/fRcersH4RjecZ8AKb3WZTQ/0/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@73620ce7))
at org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:140) ~[lucene-core-7.2.1.jar:7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - 2018-01-10 00:48:43]
......
Caused by: java.io.EOFException: read past EOF: MMapIndexInput(path="/usr/share/elasticsearch/data/nodes/0/indices/fRcersH4RjecZ8AKb3WZTQ/0/index/_47.cfe")
at org.apache.lucene.store.ByteBufferIndexInput.readByte(ByteBufferIndexInput.java:75) ~[lucene-core-7.2.1.jar:7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - 2018-01-10 00:48:43]
......
Suppressed: org.apache.lucene.index.CorruptIndexException: checksum status indeterminate: remaining=0, please run checkindex for more details (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/usr/share/elasticsearch/data/nodes/0/indices/fRcersH4RjecZ8AKb3WZTQ/0/index/_47.cfe")))
.....
What could be the problem?
what is the best solution to share the elastic data dir among all the swarm nodes?
thank you