Load unfairly distributed during large ingest

Hey friends, new guy here. :baby_bottle::innocent:

I have an ES cluster running on cheap spot instances: 1 master, 5 data nodes, 10 shards, and 1 replica. I have a round robin load balancer in front of my cluster.

During ingest, the load is heavily concentrated on two (small) nodes. I can't figure out why this is the case, as I have shards assigned to every node. Can anyone help me troubleshoot this?

Hard to tell from the picture. Are you just importing or do you have an
ingest processor or something? Those nodes look like they have more disk
usage, I wonder if you have a hot spot created by something like
parent/child.

Hey Nik, thanks for the reply.

I am just running an import, using a small Spark cluster to shove data into my ES cluster.

The two servers with high load do have less disk space. They were also the first data nodes to join the cluster, although I can't imagine why that matters. I don't have any custom routing.

I'm more of a data scientist than an infra guy, so I'm quite stumped!

I tend to use the hot_threads API to have a look at the guts and see if I
see a smoking gun in situations like this. If you post a gist of that I can
have a look.

Nik!

Thanks for pointing me in the right direction. I looked into hot_threads, entered a few other rabbit holes, and discovered that two of my shards were stuck "relocating." After I solved that issue, the cluster was able to ingest at 35k documents/sec, with a uniform load distribution.

Thanks for taking a moment to help the new guy.

Thanks for looking in hot_threads! Glad that solved your issue.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.