Unexplained Speed Improvement in Third and Subsequent Terms Aggregations

jdgenio · February 14, 2018, 1:08am

Hi @Mark_Harwood, I implemented @dadoonet's suggestion (eager global ordinals, thank you for this) in Slow terms aggregation speed on ~130M documents and was able to decrease the first load to ~2s from ~5s.

However, using random queries in the form:

{  
    "query":{
    	"bool": {
    		"must": {
    			"match_phrase":{  
	                "_all":"randomtext"
	            }
    		}
    	}
    },
    "_source":[],
    "size": 0,
    "aggregations":{  
        "suggestions":{  
            "terms":{  
                "field":"Consignee.Name.untouched",
                "size":100
            }
        }
    }
}

I get the following query time pattern, regardless of queries used:
1st search: ~2s
2nd search: ~2s
3rd search: < 1s
4th search: < 1s
...
Xth search: < 1s

My questions are:

Why does it speed up on the third query (I suspect this is from some kind of OS or ES level caching in the background)?
Is there some way I can also preemptively do this?

This question is just out of curiosity, I am perfectly happy with the current performance.

Mark_Harwood · February 14, 2018, 7:39am

Round-robining over replicas?

jdgenio · February 14, 2018, 7:54am

Which question are you answering? Also, can you please elaborate? Thank you in advance. Also, it seems like it's not specifically on the third query. I retried the setup but this time, it speeds up on the fourth query.

Mark_Harwood · February 14, 2018, 9:19am

Sorry, question 1. If your client is balancing requests across replicas any caching on your primary shard will not benefit a follow-up query routed to your replica.
To send each user back to the same replica each time and increase the chances of hitting a warm cache use their sessionID as a routing preference
Looking at the docs there it seems randomization and not round-robining is the default replica selection policy which may explain some of the inconsistencies.

jdgenio · February 14, 2018, 9:42am

If I understand correctly, then the first search will always be longer than the others? The second search is not guaranteed to be the same node as the first search's and the search improves for the third and all subsequent searches because I have three nodes (which means the possibility of hitting a warm cache by the third try is 83%, 100% on 4th)?

That makes a lot of sense, thank you. For question 2 then, is there a way to 'warm up' the caches before the user searches?

system · March 14, 2018, 9:43am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Slow terms aggregation speed on ~130M documents Elasticsearch	34	8046	May 10, 2019
Hints to improve performance for numerous aggregations with high cardinalities Elasticsearch	6	750	January 30, 2019
Search performance issues, non-cacheable cases Elasticsearch	8	498	July 6, 2017
First time query with different params is taking a lot of time Elasticsearch	18	1068	January 24, 2019
Getting less search speed Elasticsearch	4	279	April 18, 2022

Unexplained Speed Improvement in Third and Subsequent Terms Aggregations

Related topics