Why does the query response time optimized significantly after disabling indices.queries.cache.size

xiaodid · January 30, 2023, 9:16am

We have a ES 7.9.16 cluster which has 1 master node and 3 data nodes. Each data node has 31 Gb heap size.

We created 2 indexes, each index contains 2.5 billion docs. The query response time is about 1 second while query those 2 indexes.

After disabling indices.queries.cache.size (set it to 0), the response time drops to 10 ms while running the same DSL.

Could you please help us accounting for the root cause of this?
Is disabling caching a good optimization?

Thank you very much.

warkolm · January 31, 2023, 4:00am

Welcome to our community!
Please note that version is EOL and no longer supported, you should be looking to upgrade as a matter of urgency.

What is the mapping? What is the query? What is your Elasticsearch configuration?

Also - why do you have 1 master but 3 data nodes? That's not ideal fo 7.X of Elasticsearch.

Liaowei · January 31, 2023, 6:55am

Hi, Mark, thanks so much for answering our question. We have a boolean query dsl whick consists of a filter context and a should phrase. The filter context consists of a range query for timeline and a terms query, which we used to filter what matches the terms phrase and timeline between the range. The should phrase consists of a multi_match query, accordingly used to match subset of contents of the filter context. the query pattern looks like this:

"query": {
	"bool": {
	    "fitler": [
		{"range": {}},
		{"terms": {}}
	   ],
	   "should": [
		{"multi_match": {}}
	  ]
   }
}

About Elasticseach configuration: actually we did use default configurations by Elasticsearch, we have only disabled indices.queries.cache.size (set it to 0).
About cluster roles: It's only a test environment, so we have cut down some nodes.
We are really confused of the test results. Thanks again for helping us.

warkolm · January 31, 2023, 7:22am

Does that mean you are using the default heap? You must have changed something to cluster the nodes, so it'd be good if you shared your config.

It'd also be good to understand the mappings. And while that's a great start on sharing the query, how many actual search terms are you adding into the terms and multi_match values?

Liaowei · January 31, 2023, 8:18am

Well, the heap size for data nodes was 31Gb respectively and master node was 16Gb, and we use G1 GC. I am able to paste serveral configurations .

master node:
node.master: true
node.data: false
node.ingest: false
search.remote.connect: false
node.ml: false
bootstrap.memory_lock:true
cluster.remote.connect: true
cluster.max_shards_per_node: 20000
cluster.routing.allocation.node_initial_primaries_recoveries: 32
cluster.routing.allocation.node_concurrent_recoveries: 32
cluster.routing.allocation.cluster_concurrent_rebalance: 32
cluster.routing.allocation.disk.include_relocations: false

data nodes:
node.master: false
node.data: true
node.ingest: false
search.remote.connect: false
node.ml: false
bootstrap.memory_lock:true
cluster.remote.connect: true
cluster.max_shards_per_node: 20000
cluster.routing.allocation.node_initial_primaries_recoveries: 32
cluster.routing.allocation.node_concurrent_recoveries: 32
cluster.routing.allocation.cluster_concurrent_rebalance: 32
cluster.routing.allocation.disk.include_relocations: false
indices.queries:cache.size: 0%

terms phrase contains only a term,which is an array, array length varies from 1 to 6, those are our common cases.
multi_match is a 9 fields query.

mapping:

"mapping": {
	"dynamic": "strict",
	"properties": {
		"accoNo": {
			"type": "keyword"
		},
		"typeName": {
			"analyzer": "standard",
			"type": "text"
		},
		"time": {
			"format": "yyyyMMddHHmmss",
			"type": "date"
		},
		"remark": {
			"analyzer": "standard",
			"type": "text"
		},
		"channl": {
			"analyzer": "standard",
			"type": "text"
		},
		"desc": {
			"analyzer": "standard",
			"type": "text"
		},
		"summName": {
			"analyzer": "standard",
			"type": "text"
		},
		"telNo": {
			"analyzer": "standard",
			"type": "text"
		},
		"opsName": {
			"analyzer": "standard",
			"type": "text"
		},
		"companyName": {
			"analyzer": "standard",
			"type": "text"
		},
		"tacctScrt": {
			"analyzer": "standard",
			"type": "text"
		},
		...
	}
}

DSL :

"query": {
	"bool": {
	    "fitler": [
		{ 
			"range": {
				"time": {
					"from": "20200101000000"
					"to": "20210101000000"
				}
			}
		},
		{
			"terms": {
				"accoNo": [
					"00000000000000001",
					"00000000000000002"
				]
			}
		}
	   ],
	   "should": [
		{
			"multi_match": {
				"query": "some words",
				"fields": [
					"channl",
					"typeName",
					"desc",
					"summName",
					"telNo",
					"opsName",
					"companyName",
					"remark",
					"tacctScrt"
				],
				"type": "phrase",
				"operator": "or",
				"slop": 3
			}
		}
	  ]
   }
}

system · February 28, 2023, 8:19am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Trouble disabling query cache Elasticsearch	1	396	July 6, 2017
Cache Impacts Performance Elasticsearch	3	1312	July 9, 2018
ES 6.4.1: Almost empty cluster doesn't perform as expected Elasticsearch	8	551	November 20, 2018
Further optimization to ES queries / performance Elasticsearch	1	346	September 3, 2020
Is there any way to disable caching Elasticsearch	4	3394	March 29, 2019

Why does the query response time optimized significantly after disabling indices.queries.cache.size

Related topics