Long gc pause happened on es1.7.0 plus jdk8u40


(Makeyang) #1

22042 2015-10-10T10:36:44.751+0800: 1954567.631: [GC (Allocation Failure) 1954567.631: [ParNew: 1606667K->41270K(1763584 K), 1221.6222148 secs] 5895406K->4332232K(16581312K), 1221.6230128 secs] [Times: user=11780.20 sys=0.00, real=1221 .44 secs]

is it a jdk bug or how to avoid it?


(Mark Walkom) #2

How much data in your cluster, how much heap?


(Makeyang) #3

doc count: 4,695,567
index siez:4.7GB(primary shard + replicas)
index setting:
"number_of_shards": "10"
"number_of_replicas": "1"
node count: 4
heap commited: 30GB


(Makeyang) #4

please help to give suggestions


(Makeyang) #5

can anyone give any insight about this? I really bothers us a lot


(Christian Dahlqvist) #6

A bit more information around the issue would be useful:

  • Did the long GC affect one or all nodes?
  • What does the workload look like?
  • What type of hardware are the nodes deployed on?
  • Is there anything else running on the host(s)?
  • Has swap been disabled?

(Makeyang) #7

Which version of Elasticsearch and Java are you using?
[answer]as mentioned in the title: es1.7.0 and jdk8u40
Did the long GC affect one or all nodes?
[answer]only one node
What does the workload look like?
[answer]write with 500tps and search with 300tps
What type of hardware are the nodes deployed on?
[answer]256GB ram, 32 core Xeon 2.6GHZ, 1TB spin disk
Is there anything else running on the host(s)?
[answer]Yes, there is another ES instance with same memory config running on that machine
Has swap been disabled?
[answer]yes.


(Mark Walkom) #8

Are you using parent/child at all?


(Makeyang) #9

no, not at all


(Makeyang) #10

no, not at all


(Makeyang) #11

can anyone give any insight about this?


(Makeyang) #12

can you give any insights about this issue? I know it is tough issue, but u guys are exports


(Tin Le) #13

How many shards total for your cluster (primary and replica)?

Tin


(Makeyang) #14

26 primary and 26 replica. total 52 shards.


(Tin Le) #15

You mentioned 300tps for search. What does your fielddata metrics look like?

You can get that from curl -s localhost:9200/_cluster/stats

Could be your queries are using up the heap.


(system) #16