Long gc pause happened on es1.7.0 plus jdk8u40

22042 2015-10-10T10:36:44.751+0800: 1954567.631: [GC (Allocation Failure) 1954567.631: [ParNew: 1606667K->41270K(1763584 K), 1221.6222148 secs] 5895406K->4332232K(16581312K), 1221.6230128 secs] [Times: user=11780.20 sys=0.00, real=1221 .44 secs]

is it a jdk bug or how to avoid it?

How much data in your cluster, how much heap?

doc count: 4,695,567
index siez:4.7GB(primary shard + replicas)
index setting:
"number_of_shards": "10"
"number_of_replicas": "1"
node count: 4
heap commited: 30GB

please help to give suggestions

can anyone give any insight about this? I really bothers us a lot

A bit more information around the issue would be useful:

  • Did the long GC affect one or all nodes?
  • What does the workload look like?
  • What type of hardware are the nodes deployed on?
  • Is there anything else running on the host(s)?
  • Has swap been disabled?

Which version of Elasticsearch and Java are you using?
[answer]as mentioned in the title: es1.7.0 and jdk8u40
Did the long GC affect one or all nodes?
[answer]only one node
What does the workload look like?
[answer]write with 500tps and search with 300tps
What type of hardware are the nodes deployed on?
[answer]256GB ram, 32 core Xeon 2.6GHZ, 1TB spin disk
Is there anything else running on the host(s)?
[answer]Yes, there is another ES instance with same memory config running on that machine
Has swap been disabled?
[answer]yes.

Are you using parent/child at all?

no, not at all

no, not at all

can anyone give any insight about this?

can you give any insights about this issue? I know it is tough issue, but u guys are exports

How many shards total for your cluster (primary and replica)?

Tin

26 primary and 26 replica. total 52 shards.

You mentioned 300tps for search. What does your fielddata metrics look like?

You can get that from curl -s localhost:9200/_cluster/stats

Could be your queries are using up the heap.