Maybe someone faced similar problem. Data nodes (cluster 4xdata nodes) periodically dying with dmesg message:
[Fri Jul 7 02:19:34 2017] /home/kernel/COD/linux/mm/pgtable-generic.c:33: bad pmd ffff95e5b0039500(0000001b9ea009e2)
[Fri Jul 7 03:41:30 2017] BUG: Bad rss-counter state mm:ffff95e58d63be00 idx:1 val:512
[Fri Jul 7 03:41:30 2017] BUG: non-zero nr_ptes on freeing mm: 1
System has 264GB of RAM, elasticsearch process has 30G of memory assigned. We write on average of 1.2 TB of index during the day. First I suspected issue with kernel (default kernel is 4.4, but after upgrade to kernel 4.10.0 issue keep occurring)
We have similar cluster built with ES 1.7.5 with Ubuntu 15.04 kernel 3.19.0 which receives similar amount of writes/reads - this cluster does not have such problem.
Below you can see output from atop during the interval when outage happened, note weird CPU load (avg1 430.99| avg5 378.26):
PRC | sys 8.79s | user 9m57s | #proc 492 | #trun 2 | #tslpi 511 | #tslpu 430 | #zombie 0 | clones 429 | #exit 431 |
CPU | sys 1% | user 100% | irq 0% | idle 3899% | wait 0% | steal 0% | guest 0% | curf 1.94GHz | curscal 88% |
CPL | avg1 430.99| avg5 378.26 | avg15 215.31 | csw 112322 | intr 742898 | numcpu 40 |
MEM | tot 251.8G | free 15.6G | cache 196.4G | dirty 0.9M | buff 420.3M | slab 5.2G |
SWP | tot 7.4G | free 7.4G | vmcom 32.8G | vmlim 133.4G |
PID MINFLT MAJFLT VSTEXT VSIZE RSIZE VGROW RGROW UID EUID MEM CMD
2924 0 0 2K 4.2T 92.6G 0K 0K elastics elastics 37% java