Hi @DavidTurner,
i have gone through the docs you have given me. and their the solution it is saying to increase the data node or increase the size.
After increasing the data node i was getting issue like my data node is running and after sometime it is getting failed , while describing the data node i am getting "readiness provbe failed", i have tried 3 to 4 time every time i am getting same issue , so i have to again undo the changes , but now i have added extra storage(previouslu i have 190 gb) to the data node like this ,
- name: data
count: 2
podTemplate:
spec:
nodeSelector:
namespace: "elastic-system"
tolerations:
- key: "namespace"
# operator: "Exists" always commneted when applied for the first time
operator: "Equal"
value: "elastic-system"
effect: "NoSchedule"
initContainers:
- name: sysctl
securityContext:
privileged: true
runAsUser: 0
command: ['sh', '-c', 'sysctl -w vm.max_map_count=262144']
containers:
- name: elasticsearch
readinessProbe:
exec:
command:
- bash
- -c
- /mnt/elastic-internal/scripts/readiness-probe-script.sh
failureThreshold: 3
initialDelaySeconds: 20
periodSeconds: 12
successThreshold: 1
timeoutSeconds: 12
env:
- name: READINESS_PROBE_TIMEOUT
value: "40"
- name: ES_JAVA_OPTS
value: -Xms2g -Xmx2g
resources:
requests:
memory: "1Gi"
cpu: "100m"
limits:
memory: "3000Mi"
config:
# On Elasticsearch versions before 7.9.0, replace the node.roles configuration with the following:
# node.master: false
# node.data: true
# node.ingest: true
# node.ml: true
# node.transform: true
node.roles: ["data", "ingest","remote_cluster_client"]
# node.roles: ["data", "ingest", "ml", "transform"]
# node.remote_cluster_client: true
volumeClaimTemplates:
- metadata:
name: elasticsearch-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 256Gi
storageClassName: elk-azurefile-sc
and here you can see i have 2 data node having 256gb memory.
Now everyday i am getting around more than 30 gb memory , which is very high i know but still everyday it is deleteing my index.
And this is a secure elasticsearch and no one having the authntication rather than me .
here what i observe like i am running this
GET _cat/indices?v&s=health:desc,index&h=health,status,index,docs.count,pri,rep
health status index docs.count pri rep
red open filebeat-2024.03.13 1 1
green open .internal.alerts-observability.apm.alerts-default-000001 0 1 1
green open .internal.alerts-observability.logs.alerts-default-000001 0 1 1
green open .internal.alerts-observability.metrics.alerts-default-000001 0 1 1
green open .internal.alerts-observability.slo.alerts-default-000001 0 1 1
green open .internal.alerts-observability.uptime.alerts-default-000001 0 1 1
green open .internal.alerts-security.alerts-default-000001 0 1 1
green open .internal.alerts-stack.alerts-default-000001 0 1 1
green open .kibana-observability-ai-assistant-conversations-000001 0 1 1
green open .kibana-observability-ai-assistant-kb-000001 0 1 1
green open elastalert 277 1 1
green open elastalert_error 3021 1 1
green open elastalert_past 0 1 1
green open elastalert_silence 277 1 1
green open elastalert_status 471 1 1
regarding disk usage
GET /_cat/allocation?v&s=disk.avail&h=node,disk.percent,disk.avail,disk.total,disk.used,disk.indices,shards&pretty
node disk.percent disk.avail disk.total disk.used disk.indices shards
UNASSIGNED 4
elastic-search-cluster-es-data-0 2 250gb 256gb 5.9gb 6.4mb 33
elastic-search-cluster-es-data-1 2 250gb 256gb 5.9gb 7mb 33
>
Regarding culster helath
GET _cluster/health
{
"cluster_name": "elastic-search-cluster",
"status": "red",
"timed_out": false,
"number_of_nodes": 3,
"number_of_data_nodes": 2,
"active_primary_shards": 33,
"active_shards": 66,
"relocating_shards": 0,
"initializing_shards": 0,
"unassigned_shards": 4,
"delayed_unassigned_shards": 0,
"number_of_pending_tasks": 0,
"number_of_in_flight_fetch": 0,
"task_max_waiting_in_queue_millis": 0,
"active_shards_percent_as_number": 94.28571428571428
}
here what happens
here you see filebeat-2024.03.13 index showing nothing the data is automatically deleted.
Now if you see filebeat-2024.03.13 index just started beacause this index takes lots of memeory so elasticsearch status got red, for that reason i have to delete the index.
so plaease help me to solve this issue.

