Just to give a background, I am currently using a Logstash setup, hence my
current version of Elasticsearch (ES) is at 18.7. Everything is housed
under one large instance in AWS http://aws.amazon.com/ec2/instance-types/,
having only 300gb of EBS Volume http://aws.amazon.com/ebs/. Going to
switch to S3 storage http://aws.amazon.com/s3/once I scale.
There are 30 rolling indices in my ES instance, each index represents one
day of record and I combined 3 servers' worth of logs into a rolling index
daily. I am chalking up at least 150gb to 210gb in this ES instance, each
index at least having 5gb.
I am asked to scale this project to house more servers' logs, but I'm not
sure which route to go. I have problems querying facets in real-time per
minute because it takes up too much RAM just to query the data and to put
them into a graph, but I hope to scale it such that I could create a date
histogram per minute on 30 days into 1 graph.
So which route should I take? Many small instances of ES or one single
beefy AWS/ES instance for my use-case? I need it to be as cost-efficient as
possible.