Have you tested on AWS where you are running?
Given the fact that both the EBS storage I am running is just above 3000 IOPS the 5000 IOPS SSD from azure was only being utilized around 6 % maximum. Even if the IOPS per halved between the two i doubt the AWS EBS SSD would use more than 50% of it's IOPS since both are network based storages. Both are 1TB SSD's btw.
That's why i did not stress test the performance yet (I will eventually).
That sounds reasonable but was hard to tell from the screenshots as I am on mobile. Do you see any evidence of long or frequent GC in the logs?
I'd agree that your shards are far too big, but you should be fine for RAM in that situation. If you're going from 15 shards to 30 shards that will easily fit in 48GB of ram, that's over 1GB per shard! That should decrease your search time to the 100-300ms mark. As for if better CPU will help, I'd recommend to check the load/CPU usage of the nodes. If they're running high, then increase CPU, but you're not going to run out of RAM for a long time.
Hmm haven't checked that out. Thanks for the insight.
Since I have to allocate half of the RAM to JVM is that still enough in your opinion? Since out of the 48 GB of RAM 24 will go to the file system cache.
And CPU does seem to be the main issue at the moment since in my earlier testing it would hover from 60-80%. But I don't know if I should increase CPU speed or CPU core count. Do they effect the performance in the same way? My assumption is probably not but I don't really know that much about how the CPU tasks are handled by Elasticsearch.
Yes that's still plenty of RAM in my opinion, you can always check by looking at the JVM graph in Kibana. If it's just slowly increasing then dropping sharply you've got too much RAM, it should be hovering around the same level. As each shard is a Lucene instance, when you increase the amount of shards you'll increase the instances so my assumption would be more cores = better rather than faster cores, as they'll be able to parallelize more efficiently than single cores, although I don't have any metrics to back up that assertion.
Yes, that's a classic case of too much RAM. As you can see the memory usage is just slowly creeping up until you get the GC run to clear the memory. Although this is a much smaller cluster, this would be a more typical graph:
It's not really a huge problem, but consider temporarily lowering the amount of RAM available to the JVM, as the Lucene shards would probably make better use of the RAM instead. Also that should reduce the length of those GC spikes as it's clearing up the RAM.
It seems that the culprit seems to be the highlight that I am running for my queries.
If i run the same query without highlighting the results are instant (30-200ms).
Any ideas how to speed up highlight queries?
The query is something like this:
{
"query":{
"bool":{
"should":[
{
"match":{
"content":{
"query":"The simplest financial instrument is undoubtedly the contract that ties a lender (investor)to a borrower (company).",
"minimum_should_match":"60%"
}
}
}
]
}
},
"size":40,
"_source":{
"excludes":[
"content"
]
},
"highlight":{
"fields":{
"content": {
"fragment_size": 3312
"number_of_fragments":5,
"highlight_query":{
"bool":{
"should":[
{
"match":{
"content":{
"query":"The simplest financial instrument is undoubtedly the contract that ties a lender (investor)to a borrower (company).",
"fuzziness":"AUTO"
}
}
}
]
}
}
}
}
}
}
I've not used Highlighters before, but looking at that query there's one thing which I notice. You're actually running the query twice. The highlight_query
property is only when your query differs from the original query. Just having
{
"query":{
"bool":{
"should":[
{
"match":{
"content":{
"query":"The simplest financial instrument is undoubtedly the contract that ties a lender (investor)to a borrower (company).",
"minimum_should_match":"60%"
}
}
}
]
}
},
"size":40,
"_source":{
"excludes":[
"content"
]
},
"highlight":{
"fields":{
"content": {
"fragment_size": 3312
"number_of_fragments":5
}
}
}
}
Should still highlight using the original query but be a lot faster
Yes the time has decreased, but i forgot to mention (and illustrate in the query i gave ) that the query does differ a bit for highlighting.
Some words like "the", "is", "a", "to" get removed from the highlighting and is a requirement in our case.
But thanks for the feedback.
I found something that might help in this thread.
They mentioned that a fast vector highlighter will decrease the search speeds by 2x at the cost of more disk space being used (not an issue in my case). Sounds promising
Okay so the fast query highlighter is magnitudes faster than the plain or unified highlighter but does not produce the exact same results.
However this discussion has turned to a completely different topic so I will summarize some of my findings:
- Higher memory is needed when the number of shards and data grows.
- Faster CPU (GHz) helps when you are using highlighting (not by much in my cases, but your mileage may vary)
- More CPU cores helps with overall search speeds (not counting when highlighting).
If anyone finds other insights to these (or even contrary) feel free to reply.
Okay so...
One of my assumptions was incorrect.
Highlighting is not the culprit here. I thought highlighting was the culprit because when i was doing the queries i first tested them with highlighting then without it.
Turns out the query was being cached and was being returned immediately since it was in the cache.
In these cases it seems the best bang for the buck would be to reindex with more primary shards rather than actually increase hardware requirements (although since you are increasing the primary shards you are going to need enough RAM to support it).
And if you are testing queries with slight variations to see which one performs better don't forget to add request_cache=false
since this will give you the result without requesting the cache.
An example
POST yourIndex/_search?request_cache=false
{
"query":{
"bool":{
"should":[
{
"match":{
"content":{
"query":"The simplest financial instrument is undoubtedly the contract that ties a lender (investor)to a borrower (company).",
"minimum_should_match":"60%"
}
}
}
]
}
},
"size":40,
"_source":{
"excludes":[
"content"
]
},
"highlight":{
"fields":{
"content": {
"fragment_size": 3312
"number_of_fragments":5
}
}
}
}
@Aurel_Drejta Thanks for following up here. I subscribed via email so I could lurk. I'm not surprised that disabling caching improved the quality of your benchmarks. One thing I think you should watch out for though is the file system cache. My understanding is that Elasticsearch passively takes advantage of the operating system file cache. So even if you're disabling any active caching that Elasticsearch is doing with the request JSON as the key, it probably won't disable the file system caching, using (I presume) each individual file system path as key. Someone from Elastic could probably step in to confirm whether this is still affecting your benchmark. To be honest, even if it is still affecting it, I don't know how one would account for that. As far as I know, file system caching at the OS level can't be disabled.
Well disabling caching didn't really improve anything in my case. It's just that I got an actual result from testing the queries in different configurations without relying on caching.
If I run the same query twice, if caching is not disabled the query is returned immediately (200-300 ms).
That's why the caching lead me to my wrong assumption that highlighting was the culprit when in fact it wasn't.
Here is another useful tip when facing these kinds of problems.
For each shard that you have in an index elasticsearch spawns a thread to search in that shard.
That means that if i have a 4 VCPUs in a node i can only search 4 shards in parallel.
See here
So if you have a 5 primary shard monster index with much more VCPUs (16-32) you aren't actually using those VCPUs since no more than 5 threads will be spawned to search the shards of that index.
Increasing the number of primary shards will better utilize those VCPUs since for each shard elasticsearch will spawn a thread and will be searched in parallel.
So that is how more VCPUs help.
How faster CPU helps is that the searches in those shards will be faster.
So if a 2.0GHz CPU searches for 800ms in a shard a 3.0GHz CPU will be much faster at searching in that shard (~ 200-300 ms).
How more RAM seems to help is that with every new index and every new shards that is created, elasticsearch takes some RAM space for every new index and new shard (how much space they take i don't really know).
This guide on shard size helped me to better manage CPU and RAM requirenments.
And as for highlighting there are cases when it will actually slow down searches.
Two things seems to help with this:
The first being setting "index_options": "offsets"
on the field you are going to highlight or "term_vector": "with_positions_offsets"
(both of these make your index consume more space).
And the second is using the fast vector highlighter.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.