Struggling to get elastic running on k8

For the last year Ive been running elastic on a single host as a POC of sorts. Runs great but is out of disk space. It certainly keeps up with any load that I throw at it.

24 cores
64 Gb RAM

Recently I received 5 shiny new Dell servers to use for kubernetes. Each has 56 cores and 64Gb of RAM. I got k8 running and looked about for dockerfiles for elastic. Found the 'official' versions and found some k8 config files that looked good enough.

Got the master nodes running (3)
Got the data nodes running (5)
Got 5 ingest nodes running

I open the taps on my logstash processes (pulling from kafka) and quickly problems surface. The management and ingest nodes are pretty quiet but the data nodes are GCing pretty heavily. I tried various settings between 8 and 32 Gb for Xmx/Xms but the end results are the same: this system can't keep up with the load.

Connecting to the k8 containers I see that none are using more than 150% of CPU time. My original system would commonly hit 50-100% per core.

The biggest difference I see between these data nodes and my one node original system is that I was using G1GC whereas the 'official' docker images use UseConcMarkSweepGC. Early days with my one node setup I ran into OOM kills until I changed the GC setup.

Clearly elastic.co ppl know their business. So why am I struggling so with this setup? What params can I check to see whats misconfigured?

I'm no k8s expert, and it's not clear what your config actually is, but if you're using the official Helm chart then I think the following lines mean that each container is limited to 1 CPU by default:

The config Im using doesn't have those CPU limits. Just memory and Ive set it to 50Gi

FWIW - built my own dockerfile and was able to get 6.7 running on k8. Got GC down to 50-60ms.

Interesting to see how this system holds up over time.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.