Higher CPU vs Higher Memory. Which helps in what cases?

Hi everyone,

What are the cases where a higher CPU and/or faster CPU benefits the cluster more than higher memory?

I am currently running on 3 Nodes that are 8 CPU 16GB RAM configurations.

I will scale the cluster firstly by setting up up 3 dedicated master nodes as i will probably add more data nodes to the cluster.

I'm trying to figure out which of these (CPU or RAM) i should be increasing in my cluster and if i should make my current machines bigger or add new ones.

My searches mainly consist of google and bing-like searches where i query like a search engine to find the most relative documents and highlight the relative parts.

The cluster is a search heavy cluster and not an index heavy one.

I've read that the higher the number of shards the more memory should be available to the nodes so that it can hold the shard information. But does the CPU have to be raised as well in that case?
Does more CPU cores mean that more shards can be searched in parallel? Or would faster CPU's be better?

I currently have 3 indices with 5 primary shards each and each index is around 280GB in size.
My main idea is to split the indices into more shards (10 or 20) and that would probably require more RAM usage, but would the CPU requirements also need to be increased since it has to search more shards? In that case would more CPU cores be better or would a faster CPU be better.

The searches are quite slow at the moment hovering anywhere between 1s up to 20s depending on the sentence i put. I know that with my current hardware I'm not getting close to google and bing response time (and probably wont be anytime soon), but would like to figure out which direction i should focus on at the moment. (Make more nodes, invest in bigger nodes, split shards, increase CPU, increase RAM)

(I'm not looking for a definitive answer, just want to have a discussion about this topic as i can't seem to find any guides regarding it).

In the end I will test out multiple configurations (increase number of nodes with same config vs increase a single node hardware) of both CPU and RAM and will post the results i find here.

Thank you in advance

1 Like

As your data set is larger than what you can fit into the operating system page cache it is not unlikely that storage performance is the bottleneck rather than CPU or RAM. What type of storage are you using?

It would also help to know what number of concurrent queries you need to support as you need to find a balance between tuning for latency and query throughput.

1 Like

The storage is EBS (AWS) gp2 SSD. Yes i have read that sometimes the network performance can be a bottleneck and that's why it is recommended to have actual local storage attached instead of network storage but through testing I have found out that storage performance is not the issue in this case (increasing it would probably help though)

Since the cluster won't be used by many people at the moment maybe 5-10 queries per second is what is probably happening in the cluster (and is what I am aiming at for the moment).

How did you verify that storage performance is not the bottleneck? Gp2 EBS gets IOPS proportional to size and start out quite low.

Before deploying into aws i had a chance to test the above deployment in azure with the same config for nodes (1024 GB Premium SSD) and monitored the disk performance and these were the results.

Should i be checking anything else in this case?

Have you tested on AWS where you are running?

Given the fact that both the EBS storage I am running is just above 3000 IOPS the 5000 IOPS SSD from azure was only being utilized around 6 % maximum. Even if the IOPS per halved between the two i doubt the AWS EBS SSD would use more than 50% of it's IOPS since both are network based storages. Both are 1TB SSD's btw.

That's why i did not stress test the performance yet (I will eventually).

That sounds reasonable but was hard to tell from the screenshots as I am on mobile. Do you see any evidence of long or frequent GC in the logs?

I'd agree that your shards are far too big, but you should be fine for RAM in that situation. If you're going from 15 shards to 30 shards that will easily fit in 48GB of ram, that's over 1GB per shard! That should decrease your search time to the 100-300ms mark. As for if better CPU will help, I'd recommend to check the load/CPU usage of the nodes. If they're running high, then increase CPU, but you're not going to run out of RAM for a long time.

Hmm haven't checked that out. Thanks for the insight.

Since I have to allocate half of the RAM to JVM is that still enough in your opinion? Since out of the 48 GB of RAM 24 will go to the file system cache.

And CPU does seem to be the main issue at the moment since in my earlier testing it would hover from 60-80%. But I don't know if I should increase CPU speed or CPU core count. Do they effect the performance in the same way? My assumption is probably not but I don't really know that much about how the CPU tasks are handled by Elasticsearch.

Yes that's still plenty of RAM in my opinion, you can always check by looking at the JVM graph in Kibana. If it's just slowly increasing then dropping sharply you've got too much RAM, it should be hovering around the same level. As each shard is a Lucene instance, when you increase the amount of shards you'll increase the instances so my assumption would be more cores = better rather than faster cores, as they'll be able to parallelize more efficiently than single cores, although I don't have any metrics to back up that assertion.

I'm guessing based on what you said this is too much RAM right?

Yes, that's a classic case of too much RAM. As you can see the memory usage is just slowly creeping up until you get the GC run to clear the memory. Although this is a much smaller cluster, this would be a more typical graph:

It's not really a huge problem, but consider temporarily lowering the amount of RAM available to the JVM, as the Lucene shards would probably make better use of the RAM instead. Also that should reduce the length of those GC spikes as it's clearing up the RAM.

It seems that the culprit seems to be the highlight that I am running for my queries.

If i run the same query without highlighting the results are instant (30-200ms).

Any ideas how to speed up highlight queries?

The query is something like this:

{
    "query":{
        "bool":{
            "should":[
                {
                    "match":{
                        "content":{
                            "query":"The simplest financial instrument is undoubtedly the contract that ties a lender (investor)to a borrower (company).",
                            "minimum_should_match":"60%"
                        }
                        
                    }
                }
            ]    
        }
    },
    "size":40,
    "_source":{
        "excludes":[
            "content"
        ]
    },
    "highlight":{
        "fields":{
            "content": {
                "fragment_size": 3312 
                "number_of_fragments":5,
                "highlight_query":{
                    "bool":{
                        "should":[
                          {
                            "match":{
                                "content":{
                                    "query":"The simplest financial instrument is undoubtedly the contract that ties a lender (investor)to a borrower (company).",
                                    "fuzziness":"AUTO"
                                }
                            }
                          }
                        ]
                    }
                }
             }
        }
    }
}

I've not used Highlighters before, but looking at that query there's one thing which I notice. You're actually running the query twice. The highlight_query property is only when your query differs from the original query. Just having

{
    "query":{
        "bool":{
            "should":[
                {
                    "match":{
                        "content":{
                            "query":"The simplest financial instrument is undoubtedly the contract that ties a lender (investor)to a borrower (company).",
                            "minimum_should_match":"60%"
                        }
                        
                    }
                }
            ]    
        }
    },
    "size":40,
    "_source":{
        "excludes":[
            "content"
        ]
    },
    "highlight":{
        "fields":{
            "content": {
                "fragment_size": 3312 
                "number_of_fragments":5
            }
        }
    }
}

Should still highlight using the original query but be a lot faster

Yes the time has decreased, but i forgot to mention (and illustrate in the query i gave ) that the query does differ a bit for highlighting.
Some words like "the", "is", "a", "to" get removed from the highlighting and is a requirement in our case.

But thanks for the feedback.

I found something that might help in this thread.

They mentioned that a fast vector highlighter will decrease the search speeds by 2x at the cost of more disk space being used (not an issue in my case). Sounds promising :smiley:

Okay so the fast query highlighter is magnitudes faster than the plain or unified highlighter but does not produce the exact same results.

However this discussion has turned to a completely different topic so I will summarize some of my findings:

  • Higher memory is needed when the number of shards and data grows.
  • Faster CPU (GHz) helps when you are using highlighting (not by much in my cases, but your mileage may vary)
  • More CPU cores helps with overall search speeds (not counting when highlighting).

If anyone finds other insights to these (or even contrary) feel free to reply.

Okay so...

One of my assumptions was incorrect.

Highlighting is not the culprit here. I thought highlighting was the culprit because when i was doing the queries i first tested them with highlighting then without it.
Turns out the query was being cached and was being returned immediately since it was in the cache.

In these cases it seems the best bang for the buck would be to reindex with more primary shards rather than actually increase hardware requirements (although since you are increasing the primary shards you are going to need enough RAM to support it).

And if you are testing queries with slight variations to see which one performs better don't forget to add request_cache=false since this will give you the result without requesting the cache.

An example

POST yourIndex/_search?request_cache=false
{
    "query":{
        "bool":{
            "should":[
                {
                    "match":{
                        "content":{
                            "query":"The simplest financial instrument is undoubtedly the contract that ties a lender (investor)to a borrower (company).",
                            "minimum_should_match":"60%"
                        }
                        
                    }
                }
            ]    
        }
    },
    "size":40,
    "_source":{
        "excludes":[
            "content"
        ]
    },
    "highlight":{
        "fields":{
            "content": {
                "fragment_size": 3312 
                "number_of_fragments":5
            }
        }
    }
}