Use cases - Production examples: datas, queries, cluster hardware and conf, and statistics

Hi everyone,

I am considering moving one or several elasticsearch clusters to production.
Although Elasticsearch's documentation and community is great, I am
strongly startled not to find any complete use-case story stretching from
application(s) needs and data considerations to hardware ones.
Indeed, I understand why "what/how much hardware / configuration /
sharding" questions are systematically replied with both "it depends"
followed by "test".
But then, what about a few complete descriptions, out of so many
elasticsearch users, from data use case to cluster's internals, along with
a few performance and nodes stats?

So here are questions, before moving to production :

Are there any complete use cases around? Could you share some? By
complete I mean including at least some of the following :

  1. Application needs and scope
  2. Indexing Data indications : data volume, documents mapping,
    documents / indexes volume
  3. Searching Data indications : different applications, queries, use
    of facets - filters - aggregations, concurrent indexing
  4. Cluster Hardware : machines' hardware (RAM, Disks/SSD -
    DAS-JBOD/SAN/NAS), JVM heap / OS Cache, nb of machines, back office network
  5. Cluster Configuration : one or several indexes, sharding,
    replication, master nodes, data nodes, use of over-sharding at start-up,
    use of re-indexing
  6. *Benchmaks *: queries response times, QPS, with or without concurrent
    indexing, memory heap sweet spot, nodes stats

For those interested, here are the (not complete) best-among-very-few
exemples I've stumbled upon so far :
With JBOD / SAN storage discussion in "To Raid or not to Raid":!searchin/elasticsearch/hardware/elasticsearch/HSj2fZGdU1Y/4mFCBTCb-JcJ

  • Usual heap considerations in a real case :

Do not forget Elasticsearch awesome docs for moving to production
considerations :

