I'm testing a kafka-connect plugin that writes to ES (the official one from confluent, that does not use the elastic package, but own implementation) and I'm seeing a huge spike in CPU usage in the master node when doing this bulk indexing.
I have configured to write to the hot machines in our hot-warm architecture (so ES should route the queries in a hot machine to the ones that have the shard for the data), but what I don't understand is the huge CPU usage of the master machine (we have 3, only one as active, of course).
I think this is something expected? I have changed the nodes to not be a "ingestion node" and only master, but that does not did any effect.
Is this something that I have misconfigured?
Can it be this?
Adding too many coordinating only nodes to a cluster can increase the burden on the entire cluster because the elected master node must await acknowledgement of cluster state updates from every node! The benefit of coordinating only nodes should not be overstated — data nodes can happily serve the same purpose.
But I can't see why the CPU usage is too high when doing this bulk indexing.
What is the size of your cluster? Are you using dynamic mappings? How many indices and shards are you actively indexing into?
It has 28 nodes: 3 master nodes, 18 hot machines (8CPU, 64GB ram, 1.4 SSD) and 7 warm machines.
Yes, we don't fully specify the type for each field, so we rely on the dynamic mapping feature.
We typically write to 18 indexes that map to around 250 shards, which are the daily indexes, in the case I was testing, we were pushing data for 7 days, so we were writing data to more indexes in a short period of time (and all the indices going to the hot machines)
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.