Background: I am using elasticsearch with logstash to do some log analysis.
My use-case is write-heavy, and I have configured ES accordingly. After
experimenting with different setups, I am considering the following
separate log processing from ES cluster
1x Logstash server
2x ES server (1x master, 1x data-only):
- 17GB memory
- Running single ES node with 9GB allocated memory
This should be plenty of memory for the relatively small dataset I am
starting with, and will expand as needed. However, I have the following
It is my understanding that, ideally, we want one shard per index per node
(plus an additional replica shard per primary shard per node assuming
number of replicas is set to 1), meaning in this setup, I would set number
of shards per index to 2. Each index is, as of now, relatively small
(~500MB), so two shards should be fine. However, as we scale the project,
the indexes will grow, and we will eventually want to split them into more
shards. On the hardware side, the ES servers are relatively lightweight.
As we scale, we have the option to simply beef up the hardware. Finally,
my understanding is that increasing the number of shards/index down the
line requires a full reindexing of the data, which I would like to avoid.
It seems to me that I would be better off setting shards/index to 4, in
anticipation of future scaling. Are there costs to this that I am missing?
What about starting off with a single ES node on a beefier server? Should I
be concerned about availability with a single-node cluster (no replicas)?
Thanks for reading
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to firstname.lastname@example.org.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6c988e28-58dc-4af7-86a9-16d763ce4ff7%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.