The ES version is 7.11.1
We use 16C64G vm which has 4 physical hdd disks(striped lvm volume) as warm data node.
The issue is that the cpu system time of a random vm offen suddenly rises up to 100%, and then the vm keeps hanging until it leaves the cluster.
I use top and pidstat to confirm that the process is elasticsearch, and "perf top" shows like this:
71.51% [kernel] [k] __pv_queued_spin_lock_slowpath
1.75% [kernel] [k] _raw_spin_lock_irqsave
1.42% [kernel] [k] compact_checklock_irqsave.isra.24
or like this:
7.89% [kernel] [k] isolate_freepages_block
3.96% [kernel] [k] __pv_queued_spin_lock_slowpath
3.63% [kernel] [k] copy_user_enhanced_fast_string
1.75% [kernel] [k] __list_del_entry
Is this a bug, or something else?