I've been using ES in production since 0.17.6 with clusters up to 64
virtual machines and 20T data (including 3 replica). We're now thinking
about pushing things a bit further and I wondered if people here had
similar experience / needs as we do.
Our current index is 1.1 billion unique documents, 8Tb data (including 1
replica) on 37 physical machines (32 data nodes, 3 master nodes and 2 nodes
dedicated to http requests) with ES 1.3 (upgrade to 1.5 already planned).
We're indexing about 2500 new documents / second and everything's fine so
Our goal is to index (and search) about 30 billion more documents (the
backdata) + about 200 million new documents each month.
Our company is providing analytics dashboards to their clients, and they
mostly browse their data on a monthly scale, so we're routing documents
monthly. Each shard makes between 200 and 250G. The index is made of 128
shards, which makes about 10 years of data with 1 month per shard.
Considering what we already have, we should reach 240T of data (and
counting) with a single replica after we index all our backdata.
So, my questions here:
Has someone here the same use / amount of data as we do?
Is ES the right technology to do realtime, ligthspeed queries (filtered
queries and high cardinality agregations) on such an amount of data?
What were the traps to avoid? Is it better to add lots of medium machines
(12 core Xeon E5-1650 v2, 64G RAM, 1.8T SAS 15k hard drives) or a few huge
machines with petabytes of RAM, terabytes of SSD and multiple ES processes?
Any feedback on similar situation is indeed appreciated.
Have a nice day,
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to firstname.lastname@example.org.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6865703f-2302-4fe0-b929-eb9fbe55a84a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.