Our current Production Elasticsearch cluster for logs collection is manually managed and runs on AWS.
I'm creating the same cluster using ECK deployed with Helm under Terraform.
I was able to get all the features replicated (S3 repo for snapshots, ingest pipelines, index templates, etc) and deployed, but when I tried to update the cluster (changing the ES version from 8.3.2 to 8.5.2) I get a NEW elasticsearch cluster with version 8.5.2 in what doesn't appear as a rolling upgrade.
I can tell that it is a new cluster because the default 'elastic' superuser has a new password.
Also, when I check the kubernetes pods immediately after the terraform apply with updated ES version, the kibana pod doesn't even exists (probably normal) and all the ES nodes pods are simultaneously terminating.
I'm not ingesting data on this new cluster at the moment, but I'm sure that if it was the case, I would get an ingest interruption, and red health status (or maybe not, since I have what it looks like a completely new cluster...).
Most probably the problem is in my elasticsearch manifest, But I couldn't pinpoint the problem.
Here my ES manifest:
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
# copy the specified node labels as pod annotations and use it as an environment variable in the Pods; spreads a NodeSet across the availability zones of a Kubernetes cluster. Used for AZ awareness
annotations:
eck.k8s.elastic.co/downward-node-labels: "topology.kubernetes.io/zone"
name: ${cluster_name}
namespace: ${namespace}
spec:
version: ${version}
volumeClaimDeletePolicy: DeleteOnScaledown
#updateStrategy:
# changeBudget:
# maxSurge: 1
# maxUnavailable: 1
# for monitoring see: https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-stack-monitoring.html
monitoring:
metrics:
elasticsearchRefs:
- name: ${cluster_name}
logs:
elasticsearchRefs:
- name: ${cluster_name}
nodeSets:
- name: logging-nodes
count: ${nodes}
config:
# logger.org.elasticsearch: DEBUG
node.roles: ["master","data", "ingest", "ml", "transform", "remote_cluster_client"]
# this allows ES to run on nodes even if their vm.max_map_count has not been increased, at a performance cost
node.store.allow_mmap: false
cluster:
# name: "logging.elasticsearch" See: https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-reserved-settings.html
routing:
rebalance.enable: "all"
allocation:
enable: "all"
allow_rebalance: "always"
node_concurrent_recoveries: ${node_concurrent_recoveries}
# use the zone attribute from the node labels. Used for AZ awareness; double $ is used to escape during templating
node.attr.zone: $${ZONE}
cluster.routing.allocation.awareness.attributes: k8s_node_name,zone
gateway.expected_data_nodes: ${nodes}
indices.recovery.max_bytes_per_sec: ${index_recovery_speed}
# network.host: ["_ec2:publicDns_", "localhost"] See: https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-reserved-settings.html
# xpack.security.enabled: true See: https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-reserved-settings.html
podTemplate:
metadata:
namespace: ${namespace}
labels:
# additional labels for pods
stack_name: ${stack_name}
stack_repository: ${stack_repository}
spec:
volumes:
- name: aws-iam-token-es
projected:
defaultMode: 420
sources:
- serviceAccountToken:
audience: sts.amazonaws.com
expirationSeconds: 86400
path: aws-web-identity-token-file
serviceAccountName: ${service_account}
containers:
- name: elasticsearch
# specify resource limits and requests
resources:
limits:
memory: 4Gi
cpu: "1"
volumeMounts:
- mountPath: /usr/share/elasticsearch/config/repository-s3
name: aws-iam-token-es
readOnly: true
env:
# Makes the topology.kubernetes.io/zone annotation available as an environment variable and
# use it as a cluster routing allocation attribute.
- name: AWS_ROLE_SESSION_NAME
value: elasticsearch-sts
- name: ZONE
valueFrom:
fieldRef:
fieldPath: metadata.annotations['topology.kubernetes.io/zone']
# used for availability zone awareness
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
elasticsearch.k8s.elastic.co/cluster-name: ${cluster_name}
elasticsearch.k8s.elastic.co/statefulset-name: ${cluster_name}-es-default
# request 15Gi of persistent data storage for pods in this topology element
volumeClaimTemplates:
- metadata:
name: elasticsearch-data # Do not change this name unless you set up a volume mount for the data path.
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 15Gi
storageClassName: gp2
I can also post the kibana manifest but I don't think it is relevant.
To perform the upgrade, I just change the ${version} variable.
Thanks for jumping back here.
Before seeing your post I was using gavinbunney/kubectl , but your post 'inspired me' to give another try to the 'official' kubernetes_manifest.
Now, when I apply the elasticsearch version change I get the error:
│ The API returned the following conflict: "Apply failed with 1 conflict: conflict with \"elastic-operator\" using elasticsearch.k8s.elastic.co/v1: .spec.nodeSets"
│
│ You can override this conflict by setting "force_conflicts" to true in the "field_manager" block.
I tried to add the
field_manager {
force_conflicts = true
}
but then I got:
│ Error: Provider produced inconsistent result after apply
│
│ When applying changes to kubernetes_manifest.kibana_deploy, provider "provider[\"registry.terraform.io/hashicorp/kubernetes\"]" produced an unexpected new value: .object: wrong final value type: incorrect object attributes.
│
│ This is a bug in the provider, which should be reported in the provider's own issue tracker.
so I think this is a no go.
But taking out the force_conflict, even if I was getting an error, the 'plan part' of terraform was saying that it was going to update and not replace the resources, so it is a step closer.
In the end the problem was that the eck operator does a lot of changes in the spec session, so if you add the whole "spec" to the computed_fields those changes are ignored and the upgrade proceed as intended:
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.