Term aggregation on frozen index cause field data cache increase continuously

ES Version: 7.1.1
JVM Heap: 4GB

Repro steps:
Create an index with keyword field type and insert serveral docs, freeze that index, than run term query agaist the frozen index, we will see field data cache increase and seems never cleared.

	PUT foo1
	{
	  "settings": {
		"number_of_shards": 1,
		"number_of_replicas": 0
	  },
	  "mappings": {
		"properties": {
		  "education": {
			"type": "keyword"
		  }
		}
	  }
	}

	POST foo1/_doc
	{
	  "education": "master"
	}

	POST foo1/_doc
	{
	  "education": "bachelor"
	}

	POST foo1/_doc
	{
	  "education": "PhD"
	}

	POST foo1/_freeze

run term query repeatly:

	GET foo1/_search?ignore_throttled=false
	{
	  "size": 0,
	  "aggs": {
		"max_ed": {
		  "terms": {
			"field": "education"
		  }
		}
	  }
	}

check the field data:

    GET _cat/fielddata?v

after several hours run the same term query, es process start to full gc:

	[2020-03-30T06:13:43,433][WARN ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][24765] overhead, spent [3.4s] collecting in the last [4s]
	[2020-03-30T06:14:00,960][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][24782] overhead, spent [301ms] collecting in the last [1s]
	[2020-03-30T06:14:09,183][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][24790] overhead, spent [335ms] collecting in the last [1s]
	[2020-03-30T06:14:19,479][WARN ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][24798] overhead, spent [3s] collecting in the last [3.2s]

I would recommend always forcemerging indices you are going to freeze down to a single segment first. Have you done this? If not, can you try this and see if it makes a difference?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.