Ingest of data hammering one node only

We have a 14-datanode elastic cluster running version (censored for embarassment), and 3 master nodes
We have a bunch of machines sending logfile output, into the cluster, via filebeat and logstash.
logstash is pointed at one of the master nodes, to upload data.

I've noticed that pretty much only one node at a time gets hammered, along with its one partner node for replication.
I thought that data was supposed to be spread across all the nodes more or less?
the shard allocation shows that shards for the latest index are distributed, but.. only one node gets hammered.

are we doing something wrong here?
Suggestions to improve the situation here, would be appreciated.

How many shards in the index? If you only have 1 shard and 1 replica, only 2 nodes will store those shards.

Does filebeat send via logstash or directly to elastic?

Current logstash elasticsearch output will take a list of hosts and load balance across them. I would put list all your data nodes. I'd even run more than one logstash if possible.

We run logstash on each of our hot nodes but direct to only the local elasticsearch. Our beats and other traffic target all of our logstash nodes (except for some special purpose but low traffic ones).

our shardcount is 6.
so we have 6 primary, and 6 replica shards per index.

logfiles -> filebeat ->logstash -> elastic masternode => datanodes
I thought we wre supposed to kind of pick one node and put it in the logstash config.
is it actually expected to put all data nodes in there?
Seems like thats what you are saying.
oops...

also, are we NOT supposed to feed data to the master node? just pick data nodes directly?

technically there is a LB between logstash and elasticsearch.
I was now thinking maybe we should tell the LB to round robin across all data nodes...
but if its a single long connection, maybe that wont work and we need to rearchitect to have logstash talk directly to all datanodes, then?

Elastic recommends "dedicated" master nodes. You seem to also be using them as ingest or coordinating nodes. It will probably work, but it's "not recommended".

Load balancers will often direct one client ip to a target host until that host becomes unavailable. I would let logstash to the balancing across nodes instead.

I don't really understand why only 2 data nodes are hammered. You could try the hot threads query on the busy nodes and might see what is busy.

We just took "dedicated" to mean "not data nodes", meaning, "not STORING data nodes".

I"ve been googling around today, and there is precious little thats easy to find, about large scale (or even medium scale) deployments.
Most guides seem to either have a "test cluster all on one box" setup....
or, "3 nodes, all data AND master"...
or.. a big box around a bunch of master nodes, many many data nodes....
and NO DETAILS WHATSOEVER on the specifics of how the data gets in.
there's just an arrow to "the box", with zero details.
Sighhhh....

Try small steps. Maybe list all 3 "not data nodes" in logstash and see what that does, eliminating the load balancer.

The ingest process is a little mysterious. Usually a "bulk" ingest of events is sent to a node. The modulo (number of shards) math of the document id determines which shard will store each event, so the ingesting node has to send those items to probably all data nodes owning a shard), and something replicates them (I think primary node to replica).

Do a cat nodes to make sure all your nodes are listed as "i" - ingest.

all data nodes are marked "d".
other nodes are "-" (and flagged for master)
we have no nodes earmarked as ingest.

Thought those were optional, for "additional processing before ingest". the docs seem to describe.

well, I went ahead and just told one of our elasticsearch vms to spread across all data nodes.
So far, so great!
And now that the master nodes arent busy funneling data around, the response time of the cluster
(to queries, status, and things) will probably be better as well.
Thanks a lot for all your help

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.