@Aurel_Drejta Thanks for following up here. I subscribed via email so I could lurk. I'm not surprised that disabling caching improved the quality of your benchmarks. One thing I think you should watch out for though is the file system cache. My understanding is that Elasticsearch passively takes advantage of the operating system file cache. So even if you're disabling any active caching that Elasticsearch is doing with the request JSON as the key, it probably won't disable the file system caching, using (I presume) each individual file system path as key. Someone from Elastic could probably step in to confirm whether this is still affecting your benchmark. To be honest, even if it is still affecting it, I don't know how one would account for that. As far as I know, file system caching at the OS level can't be disabled.
Well disabling caching didn't really improve anything in my case. It's just that I got an actual result from testing the queries in different configurations without relying on caching.
If I run the same query twice, if caching is not disabled the query is returned immediately (200-300 ms).
That's why the caching lead me to my wrong assumption that highlighting was the culprit when in fact it wasn't.
Here is another useful tip when facing these kinds of problems.
For each shard that you have in an index elasticsearch spawns a thread to search in that shard.
That means that if i have a 4 VCPUs in a node i can only search 4 shards in parallel.
See here
So if you have a 5 primary shard monster index with much more VCPUs (16-32) you aren't actually using those VCPUs since no more than 5 threads will be spawned to search the shards of that index.
Increasing the number of primary shards will better utilize those VCPUs since for each shard elasticsearch will spawn a thread and will be searched in parallel.
So that is how more VCPUs help.
How faster CPU helps is that the searches in those shards will be faster.
So if a 2.0GHz CPU searches for 800ms in a shard a 3.0GHz CPU will be much faster at searching in that shard (~ 200-300 ms).
How more RAM seems to help is that with every new index and every new shards that is created, elasticsearch takes some RAM space for every new index and new shard (how much space they take i don't really know).
This guide on shard size helped me to better manage CPU and RAM requirenments.
And as for highlighting there are cases when it will actually slow down searches.
Two things seems to help with this:
The first being setting "index_options": "offsets"
on the field you are going to highlight or "term_vector": "with_positions_offsets"
(both of these make your index consume more space).
And the second is using the fast vector highlighter.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.