First kudos for this great product!
I'm working on incorporating an ES cluster for indexing our data. We're a
young startup with limited resources so I'm trying to figure out a minimal
setup that will hold.
Our data - we have around 15M documents per month, each around 1 - 2 K, a
short text with several additional meta data fields, some in nested
objects. So it's around 25GB of data per month. Data keeps flowing
constantly at a rate of several docs per second. Traffic is still low so no
more than a few queries per second.
According to recommendations I found in the forum it seems like the setup
Creating an index per month, aliasing them under one name with filter by
data so I can query all according to the date range I need.
Each index with 1 shard and 1 replica.
We're planning to start with a 2 node setup (which means all shards on a
single node and one for backup).
My only concern are memory issues. How are indexes handled in this
scenario? Will old indexes that are not queried a lot still impact the
memory needed? And if they are queried at some point (if querying old
data), can it cause out of memory errors like those I see reported? How
much memory should a node have for this kind of data and index setup?
Any pointers will be much appreciated.