Request: Deployment guidelines/suggestions/experiance

Hi,

During my deployment,
My system will be accessing ES using REST API. My typical document (for storing file information) will be as follows
user id,
file name,
author,
date,
type (extension),
thumbnails (10KB max, as attachment or simple base64 string)

Hardware I am planning to use are 2x Xeon 2.4GHz with 8 GB RAM machines with 250GB HDD (2 machine for fault tolerance) with ubuntu 10.04 64 bit server (Read one message from Shay that 10.10 proved better for one bug, will drill that more)

I will interested to run queries for files of grouped for specific types like images, videos etc. for specific users.

Expected usage:

  1. 10k-100k users
  2. each with 20K-50K file information documents.
    will this be hardware configuration mentioned be sufficient infrastructure? (any past experience will be good to know)
  3. Any changes to code/configuration to make sure that index is stored locally as well as periodic backup on S3?
  4. Any suggestions to optimal memory? (I can purchase 16GB memory in place of 8 GB with initial discount from service provider)

Any pointers will be BIG help. :slight_smile:

Thank you.

Best Regards,
Aditya

If you are not running on AWS, then use the local gateway, and you can periodically backup the data dir to s3 of the nodes (though it will include the replica data).

Other than that, it sounds like the setup should be good. But, you will need to run capacity tests to make sure.

The ubuntu 10.04 and using 10.10 applies to AWS.

-shay.banon
On Saturday, March 5, 2011 at 6:38 PM, aditya.kulkarni wrote:

Hi,

During my deployment,
My system will be accessing ES using REST API. My typical document (for
storing file information) will be as follows
user id,
file name,
author,
date,
type (extension),
thumbnails (10KB max, as attachment or simple base64 string)

Hardware I am planning to use are 2x Xeon 2.4GHz with 8 GB RAM machines with
250GB HDD (2 machine for fault tolerance) with ubuntu 10.04 64 bit server
(Read one message from Shay that 10.10 proved better for one bug, will drill
that more)

I will interested to run queries for files of grouped for specific types
like images, videos etc. for specific users.

Expected usage:

  1. 10k-100k users
  2. each with 20K-50K file information documents.
    will this be hardware configuration mentioned be sufficient infrastructure?
    (any past experience will be good to know)
  3. Any changes to code/configuration to make sure that index is stored
    locally as well as periodic backup on S3?
  4. Any suggestions to optimal memory? (I can purchase 16GB memory in place
    of 8 GB with initial discount from service provider)

Any pointers will be BIG help. :slight_smile:

Thank you.

Best Regards,
Aditya


Best Regards,
a

View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Request-Deployment-guidelines-suggestions-experiance-tp2638511p2638511.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

Thanks Shay for answer and clearing doubt for ubuntu.

Yes, I am not on AWS. Although want to use s3 for backup. This backup will I need to set using external tool or use ES capability for the same?

Best Regards,
Aditya