How to define multiple path.data pointing to separate volumes or PVC using ECK?

Hi!
elasticsearch.yml supports configuration setting where I can specify multiple data path for the ES data node as follows:

path.data: 
- "/mnt/data1',
- "/mnt/data2"
- "/mnt/data3"

I can define additional volumes and/or PVC using spec.nodeSets.volumeClaimTemplates
When I define additional volumes using pod spec.volumeClaimTemplates

however, when I specify:

spec:
  version: 7.5.1
  nodeSets:
    - name: default
      config:
        path.data: [ "/mnt/data1', "/mnt/data2", "/mnt/data3" ]

I cant an error from kubernetes operator that path.data is no configurable.

Is there a way to achieve this configuration using ECK?
Thank you in advance!

1 Like

Hi @ykozlov!

Thanks for opening this thread and sorry for the late answer. I spent some time investigating.

Here is an example manifest that should achieve what you want:

apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: multi-volumes 
spec:
  version: 7.6.0
  nodeSets:
  - name: default
    count: 3
    config:
      node.master: true
      node.data: true
      node.ingest: true
      node.store.allow_mmap: false
    podTemplate:
      spec:
        containers:
        - name: elasticsearch
          env:
          - name: path.data
            value: "/mnt/data,/mnt/data2,/mnt/data3"
          volumeMounts:
          - name: elasticsearch-data
            mountPath: /mnt/data
          - name: elasticsearch-data2
            mountPath: /mnt/data2
          - name: elasticsearch-data3
            mountPath: /mnt/data3
        initContainers:
        - name: chown-data-volumes
          command: ["sh", "-c", "chown elasticsearch:elasticsearch /mnt/data && chown elasticsearch:elasticsearch /mnt/data2 && chown elasticsearch:elasticsearch /mnt/data3"]
    volumeClaimTemplates:
    - metadata:
        name: elasticsearch-data
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 5Gi
        storageClassName: standard
    - metadata:
        name: elasticsearch-data2
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 5Gi
        storageClassName: standard
    - metadata:
        name: elasticsearch-data3
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 5Gi
        storageClassName: standard

A few things to note:

  • See how I override the Elasticsearch settings through an environment variable (path.data). That's an undocumented way of providing Elasticsearch settings. I opened an issue in our github repo, because you should be able to also set this through the config section: https://github.com/elastic/cloud-on-k8s/issues/2573.
  • The warning when you use a "blacklisted" setting is just a warning, but does not prevent you from actually using that setting if you know what you are doing.
  • Make sure one of your volume is named elasticsearch-data, otherwise ECK will still create the default elasticsearch-data volume for you, in addition to other volumes you ay have defined in the manifest. I created an issue in our github repo so we fix that: https://github.com/elastic/cloud-on-k8s/issues/2574
  • See how the chown-data-volumes init container is changing permission on the volumes underlying filesystems so the elasticsearch user is able to write data into them. This is set up by ECK on the default volume, but not your own volumes.
  • I created https://github.com/elastic/cloud-on-k8s/issues/2575 so we simplify the overall experience of setting up multiple volumes for Elasticsearch data.
1 Like

Hi @sebgl,

What if we do not want to use pvc at all?
Screenshot at Feb 27 16-26-41
I don't want to bind persistent volume to master and client nodes, is that possible with ECK?
I want to create an architechture like below.
Screenshot at Feb 27 16-29-51

@hmz see the end of this doc page to use emptyDirs for some of your NodeSets (I guess client nodes in your case).

Note that if you also use emptyDirs for your master nodes, and happen to loose more than half your master nodes Pods, chances are your cluster won't be able to recover. I would advise against it. It seems fine for client nodes though.

On your diagram, please also note that all Pods will talk to each other directly: data Pods won't go through a headless service to talk to Master Pods.
It does make sense, as your diagram suggests, to create your own service to route traffic to client Pods only. It does also make sense to scale data Pods independently from masters and clients.

1 Like