I have been doing some extensive research on the work here with
elastic search and am planning to use this indexing technology as the
operational datastore for our new enterprise architecture. We are
redesigning from scratch and have chosen to use an index for a near
real-time operational data store instead of a rdbms or other index
technologies like Solr and Endeca.
In initiating this project I was wondering if you would advise us in a
few things
- What is recommended as far as the particular number of nodes to
start out with? How many cores? How much RAM per node? Is there a
matrix or guidelines as to how to determine these things? - Ignoring any monetary limitations would several small nodes in a
cluster be ideal over a few large nodes or vice versa? - Would the recommendation be to use in memory storage or filesystem
storage option? It seems in memory would perform better but if the
rate of growth is fast enough may cause some issues long term (meaning
more nodes in order to keep up with growth vs increased file storage
to keep up with growth) - If we determine we would like assistance in a more personal way,
what is the recommended way to go about finding someone with Elastic
Search experience that would be willing to help out?
Since we are starting from scratch it's hard to determine the exact
details at this point for the data that will be indexed. But we are
basing our other stack components on the fact that we will be indexing
hundreds of millions of records at a steady growth of 10,000/day and
will need to anticipate 1,000 query request/sec and 50 insert/update
request/sec
If more info is needed please let me know or if there are any concerns
in using this technology this way feel free to voice those as well