Elk on Docker Swarm and glusterFS crash


I'm trying to deploy an ELK stack on docker swarm.

If I bind the elastic dir data to a Docker volume there is no problem.

The problems comes as soon as I try to bind the elstastic data dir to a glusterFS volume.
I use glusterFS to synchronise the data between all the swarm nodes in the cluster.
I deploy ELK using the following code:

    image: docker.elastic.co/elasticsearch/elasticsearch:6.2.3
    # container_name: elasticsearch
      - "http.host="
      - "transport.host="
      - "ELASTIC_PASSWORD=changeme"
    ports: ['']
      - /opt/dockershared/stack-elk/elk:/usr/share/elasticsearch/data
    networks: ['stack']

The dir '/opt/dockershared/' is a glusterFS volume:

myhost:/gvol0 on /opt/dockershared type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072,_netdev)

The ELK stack starts without problems, but after 30/60 minutes the allocation of the shards fails.
In the ELK logs I see the following exception:

[2018-04-13T08:58:16,749][WARN ][o.e.i.e.Engine ] [MPxFOvC] [metricbeat-6.2.3-2018.04.13][0] failed engine [refresh failed source[schedule]]
org.apache.lucene.index.CorruptIndexException: Problem reading index from store(MMapDirectory@/usr/share/elasticsearch/data/nodes/0/indices/fRcersH4RjecZ8AKb3WZTQ/0/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@73620ce7) (resource=store(MMapDirectory@/usr/share/elasticsearch/data/nodes/0/indices/fRcersH4RjecZ8AKb3WZTQ/0/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@73620ce7))
at org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:140) ~[lucene-core-7.2.1.jar:7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - 2018-01-10 00:48:43]
Caused by: java.io.EOFException: read past EOF: MMapIndexInput(path="/usr/share/elasticsearch/data/nodes/0/indices/fRcersH4RjecZ8AKb3WZTQ/0/index/_47.cfe")
at org.apache.lucene.store.ByteBufferIndexInput.readByte(ByteBufferIndexInput.java:75) ~[lucene-core-7.2.1.jar:7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - 2018-01-10 00:48:43]
Suppressed: org.apache.lucene.index.CorruptIndexException: checksum status indeterminate: remaining=0, please run checkindex for more details (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/usr/share/elasticsearch/data/nodes/0/indices/fRcersH4RjecZ8AKb3WZTQ/0/index/_47.cfe")))

What could be the problem?
what is the best solution to share the elastic data dir among all the swarm nodes?

thank you

The problem is that glusterFS is not supported. The correct way is to use a local disk for each node and let Elasticsearch do the replication.

Thank you for your answer.

Just a question: glusterFS shouldn't be transparent for ELK?? From ELK it should be just a folder into the Operating system, or am I wrong?

However I cannot save the ELK data on each node because the HD of each node is too little. Instead I would like to save the ELK data on a remote file system.
Do you know what is the best way to do it?

In theory yes, but in practice is very different as Elasticsearch has a very peculiar FS access pattern (the same happens with NFS). Elasticsearch is only known (i.e. fully tested) to work with local disk or a block storage.

You could try using a block storage but keep in mind that performance may be very bad, specially if you don't have enough RAM for caching.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.