During my deployment,
My system will be accessing ES using REST API. My typical document (for storing file information) will be as follows
user id,
file name,
author,
date,
type (extension),
thumbnails (10KB max, as attachment or simple base64 string)
Hardware I am planning to use are 2x Xeon 2.4GHz with 8 GB RAM machines with 250GB HDD (2 machine for fault tolerance) with ubuntu 10.04 64 bit server (Read one message from Shay that 10.10 proved better for one bug, will drill that more)
I will interested to run queries for files of grouped for specific types like images, videos etc. for specific users.
Expected usage:
10k-100k users
each with 20K-50K file information documents.
will this be hardware configuration mentioned be sufficient infrastructure? (any past experience will be good to know)
Any changes to code/configuration to make sure that index is stored locally as well as periodic backup on S3?
Any suggestions to optimal memory? (I can purchase 16GB memory in place of 8 GB with initial discount from service provider)
If you are not running on AWS, then use the local gateway, and you can periodically backup the data dir to s3 of the nodes (though it will include the replica data).
Other than that, it sounds like the setup should be good. But, you will need to run capacity tests to make sure.
The ubuntu 10.04 and using 10.10 applies to AWS.
-shay.banon
On Saturday, March 5, 2011 at 6:38 PM, aditya.kulkarni wrote:
Hi,
During my deployment,
My system will be accessing ES using REST API. My typical document (for
storing file information) will be as follows
user id,
file name,
author,
date,
type (extension),
thumbnails (10KB max, as attachment or simple base64 string)
Hardware I am planning to use are 2x Xeon 2.4GHz with 8 GB RAM machines with
250GB HDD (2 machine for fault tolerance) with ubuntu 10.04 64 bit server
(Read one message from Shay that 10.10 proved better for one bug, will drill
that more)
I will interested to run queries for files of grouped for specific types
like images, videos etc. for specific users.
Expected usage:
10k-100k users
each with 20K-50K file information documents.
will this be hardware configuration mentioned be sufficient infrastructure?
(any past experience will be good to know)
Any changes to code/configuration to make sure that index is stored
locally as well as periodic backup on S3?
Any suggestions to optimal memory? (I can purchase 16GB memory in place
of 8 GB with initial discount from service provider)
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.