Storage questions: SSD, shared vs local

Hi,

We're using Elasticsearch here to store logs and search through them.
Within a few months we will need to handle about 30K inserts per
second.

We expect to get dedicated storage for each server (which will be a
virtual machine) via iSCSI. It's still unclear whether it will be SSD
or not. I have a few questions regarding the storage field:

  • does ES tend to wear SSDs quickly? I mean, besides the sheer amount
    of data to be written, is there something else we should take into
    account?
  • what kind of storage settings should we use? I've seen that the
    default "local" is always recommended, but at the same time I've read
    that writes are not async with local Gateway. Would we get better
    insert rates if we would use a shared gateway? (eg: put all the disks
    in a Hadoop cluster or just in a RAID and provide them through NFS)

Best regards,
Radu

On Tue, Feb 21, 2012 at 9:34 AM, Radu Gheorghe radu0gheorghe@gmail.comwrote:

Hi,

We're using Elasticsearch here to store logs and search through them.
Within a few months we will need to handle about 30K inserts per
second.

Just FYI, size of the documents are also a big factor in understanding the
performance requirements

We expect to get dedicated storage for each server (which will be a
virtual machine) via iSCSI. It's still unclear whether it will be SSD
or not. I have a few questions regarding the storage field:

  • does ES tend to wear SSDs quickly? I mean, besides the sheer amount
    of data to be written, is there something else we should take into
    account?
  • what kind of storage settings should we use? I've seen that the
    default "local" is always recommended, but at the same time I've read
    that writes are not async with local Gateway. Would we get better
    insert rates if we would use a shared gateway? (eg: put all the disks
    in a Hadoop cluster or just in a RAID and provide them through NFS)

"writes are not async with local gateway" This is not correct. Using shared
gateway would not increase the insert rate. When using shared gateway, data
is still written into local disks. It's just that it's not persistent and
gets deleted when the node restarts etc.

Best regards,
Radu

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

Thanks for your reply!

OK, so local gateway is the way to go, I guess.

My documents are 1KB on average. I'm taking that into account when
calculating space requirements. I guess the insert values are a case
of "test and see". Or is it more to it?