Performance hit when multiple filebeats are sending to same ES

How do I check how many CPU cores ES has access to?

If you aren't using containers and Elasticsearch is just running on the server as a regular process, it should have access to all of the CPU cores.

When I ran iostat , the %iowait is 0.03, and kB_wrtn/s is quite stable at 137K.

Hmm, 137MBps could potentially be hitting a write limit of your SATA SSDs depending on the model they are. Would you be able to provide the average IOPS on the SSDs?

For example, at 4 shards, when there was only 1 filebeat, the indexing rate was 28K/s, but when there were 2 filebeats, the indexing rate of each filebeat dropped to 21K/s.

This is somewhat to be expected, you went from a total of 28K/s to 42K/s, meaning you're seeing an overall throughput increase. When you see the single 28K/s Filebeat, does it have a backlog/dropped events, or is it able to send/process all data it receives?

Should I just increase my index size so I can parallelize more with more shards (yet at the same time, try not to have too many that degrades search performance)?

You should try to "size" your index based on shard size rather than index size. If you are using Elasticsearch's ILM policies, you should be able to use the max_primary_shard_size setting to achieve this.

I'm trying to understand the benefits of having multiple nodes in a cluster.

There are 2 main reasons to have multiple nodes in a cluster:

  1. High Availability - Depending on your use case, this may or may not matter
  2. Greater ability to distribute load
    • Given this thread, this is probably what you're more interested in.
    • If you have a single node, you are limited to the resources that the node has, at some point you we hit a bottleneck (CPU, Memory, Disk). Given the specs of your systems, I suspect you will hit the Disk bottleneck first ().
    • By having multiple servers (and multiple shards in your index), you can more easily spread the resource load improving the general overall performance (or in your case, event throughput).
    • Note: Unless you are using a coordinating node, I'd make sure that you setup Filebeat Elasticsearch output to have all of your Elasticsearch nodes listed, so that Filebeat can use Round Robin output load distribution.