Elk on Docker Swarm and glusterFS crash

faustf · April 13, 2018, 9:18am

Hi,

I'm trying to deploy an ELK stack on docker swarm.

If I bind the elastic dir data to a Docker volume there is no problem.

The problems comes as soon as I try to bind the elstastic data dir to a glusterFS volume.
I use glusterFS to synchronise the data between all the swarm nodes in the cluster.
I deploy ELK using the following code:

elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:6.2.3
    # container_name: elasticsearch
    environment: 
      - "http.host=0.0.0.0"
      - "transport.host=127.0.0.1"
      - "ELASTIC_PASSWORD=changeme"
      - "TAKE_FILE_OWNERSHIP=1"
    ports: ['127.0.0.1:9200:9200']
    volumes:
      - /opt/dockershared/stack-elk/elk:/usr/share/elasticsearch/data
    networks: ['stack']

The dir '/opt/dockershared/' is a glusterFS volume:

myhost:/gvol0 on /opt/dockershared type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072,_netdev)

The ELK stack starts without problems, but after 30/60 minutes the allocation of the shards fails.
In the ELK logs I see the following exception:

[2018-04-13T08:58:16,749][WARN ][o.e.i.e.Engine ] [MPxFOvC] [metricbeat-6.2.3-2018.04.13][0] failed engine [refresh failed source[schedule]]
org.apache.lucene.index.CorruptIndexException: Problem reading index from store(MMapDirectory@/usr/share/elasticsearch/data/nodes/0/indices/fRcersH4RjecZ8AKb3WZTQ/0/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@73620ce7) (resource=store(MMapDirectory@/usr/share/elasticsearch/data/nodes/0/indices/fRcersH4RjecZ8AKb3WZTQ/0/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@73620ce7))
at org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:140) ~[lucene-core-7.2.1.jar:7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - 2018-01-10 00:48:43]
......
Caused by: java.io.EOFException: read past EOF: MMapIndexInput(path="/usr/share/elasticsearch/data/nodes/0/indices/fRcersH4RjecZ8AKb3WZTQ/0/index/_47.cfe")
at org.apache.lucene.store.ByteBufferIndexInput.readByte(ByteBufferIndexInput.java:75) ~[lucene-core-7.2.1.jar:7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - 2018-01-10 00:48:43]
......
Suppressed: org.apache.lucene.index.CorruptIndexException: checksum status indeterminate: remaining=0, please run checkindex for more details (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/usr/share/elasticsearch/data/nodes/0/indices/fRcersH4RjecZ8AKb3WZTQ/0/index/_47.cfe")))
.....

What could be the problem?
what is the best solution to share the elastic data dir among all the swarm nodes?

thank you

thiago · April 15, 2018, 8:28pm

The problem is that glusterFS is not supported. The correct way is to use a local disk for each node and let Elasticsearch do the replication.

faustf · April 17, 2018, 2:19pm

Thank you for your answer.

Just a question: glusterFS shouldn't be transparent for ELK?? From ELK it should be just a folder into the Operating system, or am I wrong?

However I cannot save the ELK data on each node because the HD of each node is too little. Instead I would like to save the ELK data on a remote file system.
Do you know what is the best way to do it?

thiago · April 17, 2018, 2:44pm

In theory yes, but in practice is very different as Elasticsearch has a very peculiar FS access pattern (the same happens with NFS). Elasticsearch is only known (i.e. fully tested) to work with local disk or a block storage.

You could try using a block storage but keep in mind that performance may be very bad, specially if you don't have enough RAM for caching.

system · May 15, 2018, 2:44pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ElasticSearch Unstable Elasticsearch	13	7061	July 4, 2018
Unassigned shards docker swarm Elasticsearch	2	992	March 22, 2018
Elasticsearch in kubernetes: how to use remote filesystem? Elasticsearch	1	814	August 7, 2019
ELK stack - Kibana fails time to time giving the same error ( Elasticsearch failed Search rejected due to missing shards [)[.kibana_task_manager_7.17.7_001][0]]) Kibana docker	1	111	October 18, 2023
Docker volume for Elasticsearch Elasticsearch	2	653	July 5, 2017

Elk on Docker Swarm and glusterFS crash

Related topics