Elasticsearch failed to start with fatal exception while booting Elasticsearch", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"main","log.logger":"org.elasticsearch.bootstrap.Elasticsearch","e

I have deployed a elasticsearch cluster using eck-operator on my eks cluster. I am trying to restore a snapshot stored in my s3 by trying to connect to it. here is the manifest I used ->

---
# Default values for eck-elasticsearch.
# This is a YAML-formatted file.

# Overridable names of the Elasticsearch resource.
# By default, this is the Release name set for the chart,
# followed by 'eck-elasticsearch'.
#
# nameOverride will override the name of the Chart with the name set here,
# so nameOverride: quickstart, would convert to '{{ Release.name }}-quickstart'
#
# nameOverride: "quickstart"
#
# fullnameOverride will override both the release name, and the chart name,
# and will name the Elasticsearch resource exactly as specified.
#
# fullnameOverride: "quickstart"

# Version of Elasticsearch.
#
version: 8.11.1

# Elasticsearch Docker image to deploy
#
# image:

# Labels that will be applied to Elasticsearch.
#
labels:
  deployment : dev

# Annotations that will be applied to Elasticsearch.
#
annotations:
  eck.k8s.elastic.co/license: basic

# Settings for configuring Elasticsearch users and roles.
# ref: https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-users-and-roles.html
#
auth: {}

# Settings for configuring stack monitoring.
# ref: https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-stack-monitoring.html
#
monitoring: {}
  # metrics:
  #   elasticsearchRefs:
  #   - name: monitoring
  #     namespace: observability
  # logs:
  #   elasticsearchRefs:
  #   - name: monitoring
  #     namespace: observability

# Control the Elasticsearch transport module used for internal communication between nodes.
# ref: https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-transport-settings.html
#
transport: {}
  # service:
  #   metadata:
  #     labels:
  #       my-custom: label
  #   spec:
  #     type: LoadBalancer
  # tls:
  #   subjectAltNames:
  #     - ip: 1.2.3.4
  #     - dns: hulk.example.com
  #   certificate:
  #     secretName: custom-ca

# Settings to control how Elasticsearch will be accessed.
# ref: https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-accessing-elastic-services.html
#
http:
   service:
     spec:
       # expose this cluster Service with a LoadBalancer
       type: NodePort
   tls:
     selfSignedCertificate:
       disabled: true
  # service:
  #   metadata:
  #     labels:
  #       my-custom: label
  #   spec:
  #     type: LoadBalancer
  # tls:
  #   selfSignedCertificate:
  #     # To fully disable TLS for the HTTP layer of Elasticsearch, simply
  #     # set the below field to 'true', removing all other fields.
  #     disabled: false
  #     subjectAltNames:
  #       - ip: 1.2.3.4
  #       - dns: hulk.example.com
  #   certificate:
  #     secretName: custom-ca

# Control Elasticsearch Secure Settings.
# ref: https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-es-secure-settings.html#k8s-es-secure-settings
#
secureSettings: []
  # - secretName: one-secure-settings-secret
  # Projection of secret keys to specific paths
  # - secretName: gcs-secure-settings
  #   entries:
  #   - key: gcs.client.default.credentials_file
  #   - key: gcs_client_1
  #     path: gcs.client.client_1.credentials_file
  #   - key: gcs_client_2
  #     path: gcs.client.client_2.credentials_file

# Settings for limiting the number of simultaneous changes to an Elasticsearch resource.
# ref: https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-update-strategy.html
#
updateStrategy: {}
  # changeBudget:
  #   maxSurge: 3
  #   maxUnavailable: 1

# Controlling of connectivity between remote clusters within the same kubernetes cluster.
# ref: https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-remote-clusters.html
#
remoteClusters: {}
  # - name: cluster-two
  #   elasticsearchRef:
  #     name: cluster-two
  #     namespace: ns-two

# VolumeClaimDeletePolicy sets the policy for handling deletion of PersistentVolumeClaims for all NodeSets.
# Possible values are DeleteOnScaledownOnly and DeleteOnScaledownAndClusterDeletion.
# By default, if not set or empty, the operator sets DeleteOnScaledownAndClusterDeletion.
#
volumeClaimDeletePolicy: ""

# Settings to limit the disruption when pods need to be rescheduled for some reason such as upgrades or routine maintenance.
# By default, if not set, the operator sets a budget that doesn't allow any pod to be removed in case the cluster is not green or if there is only one node of type `data` or `master`.
# In all other cases the default PodDisruptionBudget sets `minUnavailable` equal to the total number of nodes minus 1.
# To completely disable the pod disruption budget set `disabled` to true.
#
# podDisruptionBudget:
#   spec:
#     minAvailable: 2
#     selector:
#       matchLabels:
#         elasticsearch.k8s.elastic.co/cluster-name: quickstart
#   disabled: true

# Used to check access from the current resource to a resource (for ex. a remote Elasticsearch cluster) in a different namespace.
# Can only be used if ECK is enforcing RBAC on references.
#
# serviceAccountName: ""

# Number of revisions to retain to allow rollback in the underlying StatefulSets.
# By default, if not set, Kubernetes sets 10.
#
# revisionHistoryLimit: 2

# Node configuration settings.
# The node roles which can be configured here are:
# - "master"
# - "data_hot"
# - "data_cold"
# - "data_frozen"
# - "data_content"
# - "ml"
# - "ingest"
# ref: https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-node-configuration.html
#
nodeSets:
- name: elk
  config:
    # most Elasticsearch configuration parameters are possible to set, e.g: node.attr.attr_name: attr_value
    node.roles: ["master", "data", "ingest", "ml"]
    # this allows ES to run on nodes even if their vm.max_map_count has not been increased, at a performance cost
    node.store.allow_mmap: false
    # uncomment the lines below to use the zone attribute from the node labels
    #cluster.routing.allocation.awareness.attributes: k8s_node_name,zone
    #node.attr.zone: ${ZONE}
  podTemplate:
    metadata:
      labels:
        # additional labels for pods
        deployment: dev
    spec:
      serviceAccountName: "elastic-operator"
      initContainers:
        - name: symlink-token
          command:
            - sh
            - -c
            - mkdir -p "/usr/share/elasticsearch/config/repository-s3"; ln -s $AWS_WEB_IDENTITY_TOKEN_FILE "/usr/share/elasticsearch/config/repository-s3/aws-web-identity-token-file"
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: "Deployment"
                    operator: In
                    values:
                      - "kafka"
      # this changes the kernel setting on the node to allow ES to use mmap
      # if you uncomment this init container you will likely also want to remove the
      # "node.store.allow_mmap: false" setting above
      # initContainers:
      # - name: sysctl
      #   securityContext:
      #     privileged: true
      #     runAsUser: 0
      #   command: ['sh', '-c', 'sysctl -w vm.max_map_count=262144']
      ###
      # uncomment the line below if you are using a service mesh such as linkerd2 that uses service account tokens for pod identification.
      # automountServiceAccountToken: true
      containers:
      - name: elasticsearch
        # specify resource limits and requests
        resources:
          limits:
            memory: 6Gi
            cpu: 1
          requests:
            memory: 2Gi
            cpu: 200m
        env:
        # uncomment the lines below to make the topology.kubernetes.io/zone annotation available as an environment variable and
        # use it as a cluster routing allocation attribute.
        - name: AWS_ROLE_SESSION_NAME
          value: repository-s3
        #  valueFrom:
        #    fieldRef:
        #      fieldPath: metadata.annotations['topology.kubernetes.io/zone']
        - name: ES_JAVA_OPTS
          value: "-Xms2g -Xmx2g"
      #topologySpreadConstraints:
      #  - maxSkew: 1
      #    topologyKey: topology.kubernetes.io/zone
      #    whenUnsatisfiable: DoNotSchedule
      #    labelSelector:
      #      matchLabels:
      #        elasticsearch.k8s.elastic.co/cluster-name: elasticsearch-sample
      #        elasticsearch.k8s.elastic.co/statefulset-name: elasticsearch-sample-es-default
  count: 1
#   # request 2Gi of persistent data storage for pods in this topology element
  volumeClaimTemplates:
   - metadata:
       name: elasticsearch-data # Do not change this name unless you set up a volume mount for the data path.
     spec:
       accessModes:
       - ReadWriteOnce
       resources:
         requests:
           storage: 150Gi
       storageClassName: gp3
        # SecurityContext defines the security options the container should be run with.
        # If set, the fields of SecurityContext override the equivalent fields of PodSecurityContext.
        #
        # These typically are set automatically by the ECK Operator, and should only be adjusted
        # with the full knowledge of the effects of each field.
        #
        # securityContext:

          # Whether this container has a read-only root filesystem. Default is false.
          # readOnlyRootFilesystem: false

          # The GID to run the entrypoint of the container process. Uses runtime default if unset.
          # runAsGroup: 1000

          # Indicates that the container must run as a non-root user. If true, the Kubelet will validate the image at runtime to ensure
          # that it does not run as UID 0 (root) and fail to start the container if it does. If unset or false, no such validation will be performed.
          # runAsNonRoot: true

          # The UID to run the entrypoint of the container process. Defaults to user specified in image metadata if unspecified.
          # runAsUser: 1000

    # ImagePullSecrets is an optional list of references to secrets in the same namespace to use for pulling any of the images used by this PodSpec.
    # https://kubernetes.io/docs/concepts/containers/images#specifying-imagepullsecrets-on-a-pod
    # imagePullSecrets:
    # - name: "image-pull-secret"

    # List of initialization containers belonging to the pod.
    #
    # Common initContainers include setting sysctl, or in 7.x versions of Elasticsearch,
    # installing Elasticsearch plugins.
    #
    # https://kubernetes.io/docs/concepts/workloads/pods/init-containers/
#     - command:
#       - sh
#       - "-c"
#       - sysctl -w vm.max_map_count=262144
#       name: sysctl
#       securityContext:
#         privileged: true
    # - command:
    #   - sh
    #   - "-c"
    #   - bin/elasticsearch-plugin remove --purge analysis-icu ; bin/elasticsearch-plugin install --batch analysis-icu
    #   name: install-plugins
    #   securityContext:
    #     privileged: true


    # NodeSelector is a selector which must be true for the pod to fit on a node. Selector which must match a node's labels for the pod to be scheduled on that node.
    # https://kubernetes.io/docs/concepts/configuration/assign-pod-node/
    # https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-advanced-node-scheduling.html
    # nodeSelector:
    #   diskType: ssd
    #   environment: production

    # If specified, indicates the pod's priority. "system-node-critical" and "system-cluster-critical" are two special keywords which indicate the highest priorities with the former being the highest priority.
    # Any other name must be defined by creating a PriorityClass object with that name. If not specified, the pod priority will be default or zero if there is no default.
    # https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption/
    # priorityClassName: ""

    # SecurityContext holds pod-level security attributes and common container settings. Optional: Defaults to empty. See type description for default values of each field.
    # See previously defined 'securityContext' within 'podTemplate' for all available fields.
    # securityContext: {}

    # ServiceAccountName is the name of the ServiceAccount to use to run this pod.
    # https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/
    # serviceAccountName: ""

    # Optional duration in seconds to wait for the Elasticsearch pod to terminate gracefully.
    # terminationGracePeriodSeconds: 30s

    # If specified, the pod's tolerations that will apply to all containers within the pod.
    # https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/
    # tolerations:
    # - key: "node-role.kubernetes.io/elasticsearch"
    #   effect: "NoSchedule"
    #   operator: "Exists"

    # TopologySpreadConstraints describes how a group of pods ought to spread across topology domains.
    # Scheduler will schedule pods in a way which abides by the constraints. All topologySpreadConstraints are ANDed.
    #
    # These settings are generally applied within each `nodeSets[].podTemplate` field to apply to a specific Elasticsearch nodeset.
    #
    # https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-advanced-node-scheduling.html
    # topologySpreadConstraints: {}

    # List of volumes that can be mounted by containers belonging to the pod.
    # https://kubernetes.io/docs/concepts/storage/volumes
    # volumes: []

Once I applied this manifest, I got following error on starting of the pod ->

{"@timestamp":"2024-02-29T22:46:26.154Z", "log.level":"ERROR", "message":"fatal exception while booting Elasticsearch", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"main","log.logger":"org.elasticsearch.bootstrap.Elasticsearch","elasticsearch.node.name":"elasticsearch-eck-elasticsearch-es-elk-0","elasticsearch.cluster.name":"elasticsearch-eck-elasticsearch","error.type":"java.security.AccessControlException","error.message":"access denied (\"java.lang.RuntimePermission\" \"accessDeclaredMembers\")","error.stack_trace":"java.security.AccessControlException: access denied (\"java.lang.RuntimePermission\" \"accessDeclaredMembers\")\n\tat java.base/java.security.AccessControlContext.checkPermission(AccessControlContext.java:488)\n\tat java.base/java.security.AccessController.checkPermission(AccessController.java:1071)\n\tat java.base/java.lang.SecurityManager.checkPermission(SecurityManager.java:411)\n\tat java.base/java.lang.Class.checkMemberAccess(Class.java:3227)\n\tat java.base/java.lang.Class.getDeclaredConstructors(Class.java:2725)\n\tat com.fasterxml.jackson.databind.util.ClassUtil.getConstructors(ClassUtil.java:1331)\n\tat com.fasterxml.jackson.databind.introspect.AnnotatedCreatorCollector._findPotentialConstructors(AnnotatedCreatorCollector.java:115)\n\tat com.fasterxml.jackson.databind.introspect.AnnotatedCreatorCollector.collect(AnnotatedCreatorCollector.java:70)\n\tat com.fasterxml.jackson.databind.introspect.AnnotatedCreatorCollector.collectCreators(AnnotatedCreatorCollector.java:61)\n\tat com.fasterxml.jackson.databind.introspect.AnnotatedClass._creators(AnnotatedClass.java:403)\n\tat com.fasterxml.jackson.databind.introspect.AnnotatedClass.getFactoryMethods(AnnotatedClass.java:315)\n\tat com.fasterxml.jackson.databind.introspect.BasicBeanDescription.getFactoryMethods(BasicBeanDescription.java:573)\n\tat com.fasterxml.jackson.databind.deser.BasicDeserializerFactory._addExplicitFactoryCreators(BasicDeserializerFactory.java:641)\n\tat com.fasterxml.jackson.databind.deser.BasicDeserializerFactory._constructDefaultValueInstantiator(BasicDeserializerFactory.java:278)\n\tat com.fasterxml.jackson.databind.deser.BasicDeserializerFactory.findValueInstantiator(BasicDeserializerFactory.java:222)\n\tat com.fasterxml.jackson.databind.deser.BasicDeserializerFactory.createCollectionDeserializer(BasicDeserializerFactory.java:1421)\n\tat com.fasterxml.jackson.databind.deser.DeserializerCache._createDeserializer2(DeserializerCache.java:403)\n\tat com.fasterxml.jackson.databind.deser.DeserializerCache._createDeserializer(DeserializerCache.java:350)\n\tat com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCache2(DeserializerCache.java:264)\n\tat com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCacheValueDeserializer(DeserializerCache.java:244)\n\tat com.fasterxml.jackson.databind.deser.DeserializerCache.findValueDeserializer(DeserializerCache.java:142)\n\tat com.fasterxml.jackson.databind.DeserializationContext.findNonContextualValueDeserializer(DeserializationContext.java:644)\n\tat com.fasterxml.jackson.databind.deser.BeanDeserializerBase.resolve(BeanDeserializerBase.java:539)\n\tat com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCache2(DeserializerCache.java:294)\n\tat com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCacheValueDeserializer(DeserializerCache.java:244)\n\tat com.fasterxml.jackson.databind.deser.DeserializerCache.findValueDeserializer(DeserializerCache.java:142)\n\tat com.fasterxml.jackson.databind.DeserializationContext.findNonContextualValueDeserializer(DeserializationContext.java:644)\n\tat com.fasterxml.jackson.databind.deser.BeanDeserializerBase.resolve(BeanDeserializerBase.java:539)\n\tat com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCache2(DeserializerCache.java:294)\n\tat com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCacheValueDeserializer(DeserializerCache.java:244)\n\tat com.fasterxml.jackson.databind.deser.DeserializerCache.findValueDeserializer(DeserializerCache.java:142)\n\tat com.fasterxml.jackson.databind.DeserializationContext.findContextualValueDeserializer(DeserializationContext.java:621)\n\tat com.fasterxml.jackson.databind.deser.std.CollectionDeserializer.createContextual(CollectionDeserializer.java:188)\n\tat com.fasterxml.jackson.databind.deser.std.CollectionDeserializer.createContextual(CollectionDeserializer.java:28)\n\tat com.fasterxml.jackson.databind.DeserializationContext.handlePrimaryContextualization(DeserializationContext.java:836)\n\tat com.fasterxml.jackson.databind.deser.BeanDeserializerBase.resolve(BeanDeserializerBase.java:550)\n\tat com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCache2(DeserializerCache.java:294)\n\tat com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCacheValueDeserializer(DeserializerCache.java:244)\n\tat com.fasterxml.jackson.databind.deser.DeserializerCache.findValueDeserializer(DeserializerCache.java:142)\n\tat com.fasterxml.jackson.databind.DeserializationContext.findRootValueDeserializer(DeserializationContext.java:654)\n\tat com.fasterxml.jackson.databind.ObjectMapper._findRootDeserializer(ObjectMapper.java:4956)\n\tat com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:4826)\n\tat com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3809)\n\tat com.amazonaws.partitions.PartitionsLoader.loadPartitionFromStream(PartitionsLoader.java:92)\n\tat com.amazonaws.partitions.PartitionsLoader.build(PartitionsLoader.java:84)\n\tat com.amazonaws.regions.RegionMetadataFactory.create(RegionMetadataFactory.java:30)\n\tat com.amazonaws.regions.RegionUtils.initialize(RegionUtils.java:64)\n\tat com.amazonaws.regions.RegionUtils.getRegionMetadata(RegionUtils.java:52)\n\tat com.amazonaws.regions.RegionUtils.getRegion(RegionUtils.java:106)\n\tat com.amazonaws.client.builder.AwsClientBuilder.getRegionObject(AwsClientBuilder.java:256)\n\tat com.amazonaws.client.builder.AwsClientBuilder.withRegion(AwsClientBuilder.java:245)\n\tat org.elasticsearch.repositories.s3.S3Service$CustomWebIdentityTokenCredentialsProvider.<init>(S3Service.java:373)\n\tat org.elasticsearch.repositories.s3.S3Service.<init>(S3Service.java:98)\n\tat org.elasticsearch.repositories.s3.S3RepositoryPlugin.s3Service(S3RepositoryPlugin.java:115)\n\tat org.elasticsearch.repositories.s3.S3RepositoryPlugin.createComponents(S3RepositoryPlugin.java:109)\n\tat org.elasticsearch.server@8.11.1/org.elasticsearch.node.Node.lambda$new$17(Node.java:759)\n\tat org.elasticsearch.server@8.11.1/org.elasticsearch.plugins.PluginsService.lambda$flatMap$1(PluginsService.java:263)\n\tat java.base/java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:273)\n\tat java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)\n\tat java.base/java.util.AbstractList$RandomAccessSpliterator.forEachRemaining(AbstractList.java:722)\n\tat java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)\n\tat java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)\n\tat java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:575)\n\tat java.base/java.util.stream.AbstractPipeline.evaluateToArrayNode(AbstractPipeline.java:260)\n\tat java.base/java.util.stream.ReferencePipeline.toArray(ReferencePipeline.java:616)\n\tat java.base/java.util.stream.ReferencePipeline.toArray(ReferencePipeline.java:622)\n\tat java.base/java.util.stream.ReferencePipeline.toList(ReferencePipeline.java:627)\n\tat org.elasticsearch.server@8.11.1/org.elasticsearch.node.Node.<init>(Node.java:775)\n\tat org.elasticsearch.server@8.11.1/org.elasticsearch.node.Node.<init>(Node.java:344)\n\tat org.elasticsearch.server@8.11.1/org.elasticsearch.bootstrap.Elasticsearch$2.<init>(Elasticsearch.java:236)\n\tat org.elasticsearch.server@8.11.1/org.elasticsearch.bootstrap.Elasticsearch.initPhase3(Elasticsearch.java:236)\n\tat org.elasticsearch.server@8.11.1/org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:73)\n"}  
ERROR: Elasticsearch did not exit normally - check the logs at /usr/share/elasticsearch/logs/elasticsearch-eck-elasticsearch.log  
ERROR: Elasticsearch exited unexpectedly, with exit code 1

In the manifest the serviceAccount being used is creted by eck-operartor and I have annotated that with the below given permisison ->

    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:*"
            ],
            "Resource": "*"
        }
    ]
}

I have annoated the service accoiuunt with arole thatt has this permiison . The goal is to restore the snapshot from the desired s3 repo all the indices at once.
Please help with this.

The error you're seeing is likely due to a security policy issue within the JVM Elasticsearch runs on. To fix it, ensure your Elasticsearch pods' service account is correctly annotated with an IAM role that has the necessary S3 permissions. Double-check the S3 client configuration in Elasticsearch, verify the ECK operator correctly injects AWS credentials, and review JVM security policies. Lastly, check Elasticsearch and Kubernetes pod logs for more clues. If the issue remains, consider consulting the Elastic community or support for help, especially if using custom plugins or features.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.