Hi Steffen,
After my reply, I duplicated another 2 VMs to setup everything from scratch. Re-check and analyzed as according to your steps. Interestly, the Logstash's throughput was approx. the same as Filebeat (both at ~2.5k/s). So I startup another Filebeat from 3rd VM, the Logstash was generating ~4.5k/s.
However ES was only generating still ~500/s. Then i began tweaking the no. of shards, and boom!. ES's throughput is now at ~4.5k/s (This is only after I increased the total no. of shards from default 5 to 7 via the template API). Load factor is at 13.x in the 12 cores node.
I'm very puzzled, how the 2 additional index shards could have impacted so much.
Nonetheless, thank you very much for your help.