Curl to single node elasticsearch cluster gets timed out from within the AKS cluster

I have a single node Elasticsearch cluster which is deployed using a kubernetes deployment object. The Elasticsearch pod's port are exposed using a kubernetes service object.

K8s Deployment and Service yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  namespace: # assign appropriately values are pim-dev,pim-qa,pim-uat and pim-prod
  name: elasticsearch
  labels:
    app: elasticsearch
spec:
  replicas: 1
  selector:
    matchLabels:
      app: elasticsearch
  template:
    metadata:
      labels:
        app: elasticsearch
    spec:
      affinity:
       nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
         nodeSelectorTerms:
         - matchExpressions:
           - key: "elastic"
             operator: In
             values:
             - "dev" #change as per environment dev, qa, uat
      volumes:
        - name: volume
          persistentVolumeClaim:
            claimName: azure-managed-disk
      containers:
      - name: elasticsearch
        image: docker.elastic.co/elasticsearch/elasticsearch:7.9.2
        resources:
          requests:
            memory: "3Gi"
            cpu: "1000m"
          limits:
            memory: "4Gi"
            cpu: "1000m"
        ports:
        - containerPort: 9200
          name: http
        env:
        - name: discovery.type
          value: single-node
        - name: bootstrap.memory_lock
          value: "true"
        - name: network.host
          value: "0.0.0.0"
        - name: ES_JAVA_OPTS
          value: -Xms2g -Xmx2g
        volumeMounts:
        - mountPath: "usr/share/elasticsearch/data"
          name: volume
      initContainers:
      - name: init-myservice
        image: busybox:1.28
        command: ["sh", "-c", "chown -R 1000:1000 /usr/share/elasticsearch/data"]
        securityContext:
          privileged: false
        volumeMounts:
        - name: volume
          mountPath: "/usr/share/elasticsearch/data"
      - name: increase-vm-max-map
        image: busybox
        command: ["sysctl", "-w", "vm.max_map_count=262144"]
        securityContext:
          privileged: true
      - name: tcp-transmission-settings
        image: busybox
        command: ["sysctl", "-w", "net.ipv4.tcp_retries2=5"]
        securityContext:
          privileged: true```

**Elasticsearch kubernetes Service yaml:**

apiVersion: v1
kind: Service
metadata:
  name: elasticsearch
  namespace:  #change as per environment
spec:
  selector:
    app: elasticsearch
  ports:
  - protocol: TCP
    port: 80
    targetPort: 9200

Cluster Health:

{
  "cluster_name" : "docker-cluster",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 6,
  "active_shards" : 6,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}

Problem statement: While there are no issues booting up the single node cluster. I am facing inconsistency accessing the Elasticsearch instance from within my kubernetes cluster.

I can curl to http://elasticserch:80, but the behavior is highly inconsistent. I keep getting time out after 10-15 successful tries.

How can i make the single node cluster more reliable?

EDIT: More information: The single node Elasticsearch cluster is almost always accessible from outside the k8s cluster, but there seems to be a node to node timeout.

The issue was not with the Elasticsearch but how our AKS was configured. A single subnet was routing to 3 AKS clusters which was causing the DNS resolution to fail.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.