An ES version of 7.6.0, with 1 million docs , 1.76Gi on size.
I found a query slow with only a simple term filter in it. (about 2s to excute)
The query body is just as below
When I set profile to true (of cause I'd switched city_id to aviod cache effect). It said only tens of millisecond was spent on query, while actual took is about 2-5 seconds.
Since I set source tag to false that there is no need to fetch source from disk , Where did rest of the time spend on ? What can I do to make such query quicker?
Our disk is normal mechanical. But I'v also tried it on SSD, the improvement is little. The scoring goes in index, and as I don't retrieve source field it should not concern with disk, shouldn't it?
As I known, some search engine will recall around 500 docs for next finer rank. And the whole walkthrough job fininsh in 1 seconds. So there is not much time accepted for first recall step. Besides the time spend is already high as I only use the simplest term filter. The delay must become unbearable if I add more complicated filter condition.
I wonder what did others do while building up multi step ranking, such as es along with learning to rank model.
Do you mean that before we offer service to others we should find and run sufficient query case to warm es up ?
How many cache size should we take for about 2GB docs, if we want to cover almost all query case in cache ?
Another thing
I guess that I might misunderstand some conception. Since I was confused by huge time-spend difference between the profile showing and the real one.(10ms vs 2s)
Although I know that the profile is calculate by sampling, yet the gap is unreasonable.
If you don't want the very first user of your service to pay the price, most likely yes.
It's not exactly caching all queries but let the OS caches files that are frequently read.
Although there is also some cache for filters. Both matter.
Just run some tests by yourself. You'll probably see if you really need or not to implement a warmer or not. My guess is that it is normally not needed.
I don't think that profile is sampling. But if you are running the query with "profile": true after you ran a similar query, then the cache is probably playing a role.
If you want to have a similar response time, you should:
I had tried many times, and carefully to avoid cache effect by switch id every time.
Here is a typical snatshoot from Kibana
Which says that time spend is only 3.5 milliseconds. As we know, it's far more less than what it really takes.
I wonder what the profile time really represent for?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.