My company uses elastic and I wanted to get more familiar with it so I spun up a k3's cluster in my homelab and have been trying to get it working for the last couple weeks but it keeps having issues.
My setup is as follows:
There are three machines in the cluster, two of those machines have 64 cores, 15tb nvmes, and 512gb memory each, and these are the ones i'm trying to deploy the nodes to. K3's is running on the machines and they are seen as a cluster. The latest eck operator was installed and when I run the below the cluster builds without issue. I'm able to login to kibana, download http_ca.crt and use that plus the password to create indexes and load data using a python script.
The issue:
I'm loading 100m(100gb) in batches of 5000 and each batch usually takes between 2-6 seconds and it runs fine, until at some point the time per batch jumps to 240seconds, then it processes a couple additional batches and then the script gets a timeout error from elastic. Once this happens, If i try to restart the script it will either timeout again after a few minutes or it will load a couple batches that take hundreds of seconds each before timing out yet again. Kibana still works to a degree and I can see that the nodes and health all show green. I say somewhat because the response time in the elastic dashboard after the initial timout occurs increases to tens of seconds and sometimes I need to reload the page multiple times to get it to work.
The only solution to get it working again is to destroy the cluster and start over which is obviously not practical. Sometimes the timeout starts occuring after 60 minutes, other times it takes up to 3.5 hours, but eventually it does stop responding.
I have looked at the logs and have googled basically every warn and error message in them and I still can't decipher what is causing this to happen. I'm at a loss on what to do and I really don't to run a single node setup considering I will ultimately need to import 10tb of data in total into my cluster. I'm looking for some help or guidance on what could be causing this issue.
Thanks in advance for any help My Kubernetes yaml file is below.
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: elasticsearch-cluster
namespace: elastic-stack
spec:
version: 8.13.1
nodeSets:
- name: machine1-node-set
count: 1
config:
node.store.allow_mmap: false
xpack.monitoring.collection.enabled: true
network.host: 0.0.0.0
discovery.seed_hosts: ["elasticsearch-cluster-es-transport.elastic-stack.svc.cluster.local"]
cluster.initial_master_nodes: ["elasticsearch-cluster-es-http"]
podTemplate:
metadata:
labels:
app: elasticsearch-cluster-es-node
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- machine1
containers:
- name: elasticsearch
resources:
requests:
memory: "12Gi"
cpu: "8000m"
limits:
memory: "16Gi"
cpu: "12000m"
- name: machine2-node-set
count: 2
config:
node.store.allow_mmap: false
xpack.monitoring.collection.enabled: true
network.host: 0.0.0.0
discovery.seed_hosts: ["elasticsearch-cluster-es-transport.elastic-stack.svc.cluster.local"]
podTemplate:
metadata:
labels:
app: elasticsearch-cluster-es-node
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- machine2
containers:
- name: elasticsearch
resources:
requests:
memory: "12Gi"
cpu: "8000m"
limits:
memory: "16Gi"
cpu: "12000m"
---
apiVersion: v1
kind: Service
metadata:
name: elasticsearch-cluster-es-http
namespace: elastic-stack
labels:
common.k8s.elastic.co/type: elasticsearch
elasticsearch.k8s.elastic.co/cluster-name: elasticsearch-cluster
spec:
type: NodePort
selector:
app: elasticsearch-cluster-es-node
common.k8s.elastic.co/type: elasticsearch
ports:
- name: https
port: 9200
protocol: TCP
targetPort: 9200
nodePort: 32123
---
apiVersion: kibana.k8s.elastic.co/v1
kind: Kibana
metadata:
name: kibana
namespace: elastic-stack
spec:
version: 8.13.1
count: 1
elasticsearchRef:
name: elasticsearch-cluster
podTemplate:
spec:
containers:
- name: kibana
resources:
requests:
memory: "2Gi"
cpu: "1000m"
limits:
memory: "4Gi"
cpu: "4000m"
---
apiVersion: v1
kind: Service
metadata:
name: kibana-kb-http
namespace: elastic-stack
labels:
common.k8s.elastic.co/type: kibana
kibana.k8s.elastic.co/name: kibana
spec:
type: NodePort
selector:
common.k8s.elastic.co/type: kibana
kibana.k8s.elastic.co/name: kibana
ports:
- name: https
port: 5601
protocol: TCP
targetPort: 5601
nodePort: 32700