Multitier Storage with ES

I'm currently evalutating ELK for logfile storage and analysis.
Since we may acquire up to 50GB per day we need to use different storage pathes
according to the age of indices.

So the idea I'd like to ask for feedback is this:

I have two nodes which will run three instances of elasticsearch , each.
On every node there will be a fast, a medium and a slow storage tier mounted in filesystem
and the elasticsearch instance will be started with the according storagepath,
where fast is flash based for recent indices, medium is SAS-HDD and slow is SATA-HDD for archive.

## Node 1, 3 Instanzen
bin/elasticsearch --node.vm vm1 --node.perf fast
bin/elasticsearch --node.vm vm1 --node.perf medium
bin/elasticsearch --node.vm vm1 --node.perf slow

## Node 2, 3 Instanzen
bin/elasticsearch --node.vm vm2 --node.perf fast
bin/elasticsearch --node.vm vm2 --node.perf medium
bin/elasticsearch --node.vm vm2 --node.perf slow

### VM-awareness
cluster.routing.allocation.awareness.attributes: vm

Using either an external script or curator I'd want to tune the settings for an index
older than n days:

### Fast Indices
PUT test/_settings
  "index.routing.allocation.exclude.perf": "slow, medium"

### medium Indices
PUT test/_settings
  "index.routing.allocation.exclude.perf": "slow, fast"

### slow Indices
PUT test/_settings
  "index.routing.allocation.exclude.perf": "fast, medium"

If I get it right, ES would migrate the indices to the according instance which uses
the configured storage tier.

Is this a totally wrong idea or could you give any recommendations on this?

We wouldn't recommend running more than one instance of ES on the same machine. Your desired setup sounds very much like the hot/warm architecture we would advocate, see this blog post for details:

1 Like

@abeyad Thank you very much for your feedback and the link :slight_smile: I think I will follow the blog more closely from now on :wink:
I will try the hot/warm architecture with dedicated nodes on my playground.

One caveat, you can if you have the resources; CPU and RAM primarily.

after thinking it through more machines make more sense. We could have more pathes to the storage available if we can manange to have the VMs running in different blade centers, so we won't eat all the I/O bandwith and IOPS in a single enclosure with a hand full of VMs :wink: