SSD and one replica vs HDD and more replicas

We are monitoring the internet and we have aggregated a large base of
documents. We currently have 1 165 908 189 documents, which we want to
emigrate to elasticsearch, and we would like to ask you a few questions.

The question concerns the drives for elasticsearch. The system uses the
SSD, however they are four times more expensive than HDD.
Which solution would you recommend to use: SSDs and one reply for the
shard, or HDD
and 3 replicas of the shard?

How did you come to the conclusion you needed one of those two?

A single node will only ever have a single copy of a specific shard. Are you comparing having a smaller number of nodes backed by SSD vs 3 times as many nodes backed by HDD or are you discussing storage per node?

Can I buy 16 servers with 1 TB (2 x 500 GB) SSD or 16 servers with 4 TB (2 x 2 TB). One index keeps documents for the month and can have 200GB - 300GB. For SSD configuration can be

The index has 8 shards, a single shard is on one server and one shard has one replica. Servers with replicas are in bold boxes.

Funy simple drawing below :slight_smile:

In the case of the servers HDD I have more replicas but may be a case in which the primary index shard and a replica of a shard of the same index are on the same machine.

Can you tell us a bit more about your use case? How much CPU and RAM will each node have?

IntelĀ® Coreā„¢ i7-6700
Technology
RAM 32 GB DDR4 RAM
Hard Drive 2 x 500 GB SATA SSD

or

IntelĀ® Coreā„¢ i7-6700
Technology
RAM 64 GB DDR4 RAM
Hard Drive 2 x 2 TB SATA

IntelĀ® Coreā„¢ i7-6700
Technology
RAM 32 GB DDR4 RAM
Hard Drive 2 x 500 GB SATA SSD

or

IntelĀ® Coreā„¢ i7-6700
Technology
RAM 64 GB DDR4 RAM
Hard Drive 2 x 2 TB SATA HDD

What about the use case? Is it a search use case? Is it therefore likely to be query heavy? What type of load will it be under? What are the latency requirements?

We have 24 indexes representing the months with 200-300GB. For example, we want to ask a query please indicate the documents that contain the word milk. A query can return docuements 15 000 per month. You will want to do more statistics for 12 months. But it is a heavy case. The system will load 300 customers. The standard query will deal last month and returned 300-5000 documents, we use 30 machines.

I would suspect SSDs with fewer replicas would give better performance, but you really need to benchmark in order to be sure as it will depend on the data as well as type and nature of queries.

1 Like