Cannot find an FAQ - is there one?

Apologies if the mandatory "tag" makes no sense, but I have no idea what the three options meant - so I chose the one in the middle.

Is there an FAQ where noobs like myself can get most of the answers that we need, without bothering others over RTFM type questions?

Specifically, I am after guidelines for hardware requirements once volume of data, number of searches, number of updates etc is factored on. I appreciate that this is "That Depends", but was hoping there might be a crude guideline in an FAQ somewhere.

Kind of : How many GB of Data, how big each record, how many queries per minutes, acceptable delay period, how many records updated per day - and so on.

I'm trying to see if a Gen8 Proliant 560 will "handle it easily" or "not a chance"., rather than annoy lots of people with daft questions, I was hoping to find an FAQ - but nothing found.

Any "sizing guidelines / videos / discussion" thread worth looking at please?

Thank you.
Guy

It would help if you described the use case and the requirements. A traditional search use case is generally sized and configured quite differently from a log/metrics use case.

Ok thank you Chrsitian. I just didn't want to bother folks if there was already a resource on here for usage guidelines.

The project is for an experimental research database which will end up indexing around 2K from the END of about 150 million web pages - so very roughly 300GB of Jsoup extracted data.

The pages will be updated once every five weeks (so approximately 2K x 3,000 rescrapes per minute). In addition, the analyzing software (running on another server) is likely to be making about 5 search requests per second when "flat out".

Searches will be very similar to a regular "keyword" type of search phrase with a general mix of single keyword and multiple keyword searches. Fuzzy logic will be switched on.

One second response time is acceptable, and ideally at least 48 results (assuming there are that many "hits"). This could be reduced if needed to a minimum of 18 results.

The Server we have spare is a Proliant 560 Gen 8, 256GB ram, quad 4657 xeon - giving a system total of 48 cores / 96 threads.

We are just trying to see if this is a case of "don't even think about it, not a chance" or "should handle it, but will run hot" or "should do it easily".

I'm very happy to give more information if needed, but in great summary:-

300 GB of data - 150 million pages of approx 2K each
Pages refreshed over a 5 week cycle - so 3,000 re-scrapes/min or 50 re-scrapes/sec
Max search load - 5 searches per second, ideally with 48 results, 18 will do.
Dedicated Proliant 560 Gen8 256GB ram, 48 cores, RAID 10 SSD set up

If you could offer an opinion please Christian, that would be much appreciated.

Kind Regards
Guy

Hi @GuyMark we have some guidance on sizing here: Benchmarking and sizing your Elasticsearch cluster for logs and metrics | Elastic Blog

In general, I would say that that server should be able to handle this load (depending on its on-disk storage) but that Elasticsearch is designed and works best with multiple nodes/servers, not a single one.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.