Intel Xeon D/Blobs in ES/Slave replica

Hello,

I hope I'm in the right category. If not please move this topic to the right one.

I'm already writing an indexing mechanism for pictures to index about 10.000.000 pictures of a friend of mine who's a professional photographer. When this is done he'll also get a new system for storage because the existing one is not able to keep up with the amount of data anymore.

We're planning to use a SAS HBA with spinning disks for the pictures and a bunch of SSDs for cache and Elastic Search. Memory will be between 64 and 128GB. I guess memory can only be replaced by more memory :wink:

The only thing I'm not sure about is the processor. As he wants to get a small chassis for the server itself (disks are stored in another chassis), we looked at Supermicro's embedded servers but they all come with the Xeon D.

AFAIK those processors are "better" Intel Atom chips? They usually come with 8 cores, but I'm not sure if they're the "number crunchers" I need.

Has anyone ever run ES on one of those processors?
Maybe has some information about the benchmark?
Any recommendations on the processor to use?
What is better more cores or higher speed per core? (I'd rather go for more cores than higher speeds.)

Another problem I got right now is that I want to store the preview image as base64 blob in ES. The size is around 6-10kb per image after being converted to base64.

As it is usually a bad idea to store blobs inside a database system I wonder how it is when it comes to ES. Will it kill performance or is it perfectly fine to store the information inside?

The idea here is to have the indexed information including the preview picture available from inside ES and then sync the index to his notebook so that he's able to search there when he's on the road.

Is there a way to make the indices on the notebook read only? I couldn't find it in the docs, but maybe I've just been blind.

If someone could point me into the right direction on these topics it'd be great.

Thanks a lot!

KR,

Oliver

This is not true. Xeon D is a system-on-chip design, to save space and energy in cloud servers/web servers environments, comparable to ARM SoC. The CPU was launched by Intel after pressured by Facebook to keep ARM out of Intel data centers: http://www.nextplatform.com/2016/03/14/xeon-d-shows-arm-can-beat-intel/

They are ok for Elasticsearch-like applications. You will not get maximum possible performance but a very good performance-per-watt ratio. It's not clear from your description but you should consider more than one server if you want to scale out.

Not me.

Use it.

More cores.

6-10 KB base64 strings are ok for Elasticsearch. Just disable indexing on the base64 field. It can kill performance if you want to process the JSON source docs at large scale, however, but not at Elasticsearch side. The performance degrade is mostly on the network and on client side when the bas64 strings must be handled. I suggest to create a thumbnail index, keeping search data and stored data separate.

No.

There's no such thing a slave either, not sure what you meant there?