My initial suggestion would be to set your templates to 3 shards, 1
replica. With three data nodes, you'd have two shards per index, at 5
indexes/day, that's 10 shards per day per index per node. 3 nodes/10
shards per day/30 days is 900 shards. I don't know any 'cutoff' per se,
but 900 may be a bit much for ~10g instance, but I've run 1500+ shards on
16g instances.
I set my shards/replicas via template to match my auto-index-naming
starting with year (20* matching), though you can do it via your YML config
as well.
{
"template" : "20*",
"settings" : {
"index.number_of_shards" : 18,
"index.number_of_replicas" : 1,
"index.auto_expand_replicas" : false
},
"mappings" : {
"default" : {
"_source" : { "compress" : false },
"properties" : {
"priority" : { "type" : "string", "index" :
"not_analyzed" },
"facility" : { "type" : "string", "index" :
"not_analyzed" },
...and so on.
Default without any settings is 5 shards/1 replica per index, which
wouldn't distribute evenly across 3 data nodes. It will balance out over
multiple days though. That's not necessarily a bad thing, as more cpus can
search faster, but the more shards, more ram used, etc.
I currently have a one dedicated master node and one dedicated search node.
In a prod environment, I'd have a small group of virtual masters (3-5?),
but probably only the one virtual search node (we do far more indexing
that searching). Depending on how much searching, you may not need a
dedicated search node, you can just hit any node on 9200, or do a
search/master combo dedicated, or...really lots of ways, this is where I'm
weak though, not sure how to estimate needs, as I don't have my environment
mapped out!
Are some of your indexes much larger that others per day? If so, I believe
nodes are balanced by shard, not by shard disk usage -- so a much smaller
shard is the same for ES 'capacity planning' as a larger one. Unless this
changed recently in 1.0.x ?
-Zachary
On Tuesday, March 4, 2014 9:51:47 AM UTC-6, Eric wrote:
Zach,
Thanks for the information. With my POC, I have 2 10 gig VMs and I'm
keeping 7 days of logs with no issues but that is a fairly large jump and I
could see where it may pose an issue.
As far as the 150 indexes, I'm not sure on the shards per index/replicas.
That is the part that I'm the weakest on in ES setup. I'm not exactly sure
how I should set up the ES cluster as far as the shards, replicas, master
node, data node, search node etc.
I fully agree with the logstash directly to ES. I have 1 logstash instance
right now failing 5 files and directly feeding in to ES and I've enjoyed
not having another application to have to worry about.
Eric
On Tuesday, March 4, 2014 10:32:26 AM UTC-5, Zachary Lammers wrote:
Based on my experience, I think you may have an issue with OOM trying to
keep a month of logs with ~10gb ram / server.
Say, for instance, 5 indexes a day for 30 days = 150 indexes. How many
shards per index/replicas?
I ran some tests with 8GB assigned to my 20x ES data nodes, and after a
~7 days of single index per day of all log data, my cluster would crash due
to data nodes going OOM. I know I can't perfectly compare, and I'm someone
new to ES myself, but as soon as I removed the 'older' servers from the
cluster that had smaller ram, and gave ES 16GB for each data node, I've not
gone OOM since. I was working with higher data rates, but I'm not sure the
volume mattered as much as my shard count per index per node.
For reference, my current lab config is 36 data nodes, running single
index per day (18 shards/1 replica), and I can index near 40,000 per second
at beginning of day, closer to 30,000 per second near end of day when index
is much larger. I used to run 36 shards/1 replica, but I wanted the
shards/index/per node to be minimal, as I'd really like to keep 60 days
(except I'm running out of disk space on my old servers first!) To pipe
the data in, I'm running 45 separate logstash instances, each monitoring a
single FIFO that I have scripts simply catting data into. Eash LS instance
is joining the ES cluster (no redis/etc, I've had too many issues not going
direct to ES). I recently started over after keeping steady with 25B log
events over ~12 days (but ran out of disk so had to delete old indexes). I
tried updating to LS1.4b2/ES1.0.1, but it failed miserably, LS1.4b2 was
extremely, extremely slow in indexing, so I'm still LS 1.3.3 and ES0.90.9.
As for master question, I can't answer. I'm only running one right now
for this lab cluster, which I know is not recommended, but I have zero idea
how many I should truly have. Like I said, I'm new to this
-Zachary
On Tuesday, March 4, 2014 9:11:59 AM UTC-6, Eric Luellen wrote:
Hello,
I've been working on a POC for Logstash/Elasticsearch/Kibana for about 2
months now and everything has worked out pretty good and we are ready to
move it to production. Before building out the infrastructure, I want to
make sure my shard/node/index setup is correct as that is the main part
that I'm still a bit fuzzy on. Overall my setup is this:
Servers
Networking Gear
syslog-ng server
End Points -----------------> Load Balancer
------------> syslog-ng server --------------> Logs
stored in 5 flat files on SAN storage
Security Devices
syslog-ng server
Etc.
I have logstash running on one of the syslog-ng servers and is basically
reading the input of 5 different files and sending them to Elasticsearch.
So within Elasticsearch, I am creating 5 different indexes a day so I can
do granular user access control within Kibana.
unix-$date
windows-$date
networking-$date
security-$date
endpoint-$date
My plan is to have 3 Elasticsearch servers with ~10 gig of RAM each on
them. For my POC I have 2 and it's working fine for 2,000 events/second. My
main concern is how I setup the Elasticsearch servers so they are as
efficient as possible. With my 5 different indexes a day, and I plan on
keeping ~1 month of logs within ES, is 3 servers enough? Should I have 1
master node and the other 2 be just basic setups that are data and
searching? Also, will 1 replica be sufficient for this setup or should I do
2 to be safe? In my POC, I've had a few issues where I ran out of memory or
something weird happened and I lost data for a while so wanted to try to
limit that as much as possible. We'll also have quite a few users
potentially querying the system so I didn't know if I should setup a
dedicated search node for one of these.
Besides the ES cluster, I think everything else should be fine. I have
had a few concerns about logstash keeping up with the amount of entries
coming into syslog-ng but haven't seen much in the way of load balancing
logstash or verifying if it's able to keep up or not. I've spot checked the
files quite a bit and everything seems to be correct but if there is a
better way to do this, I'm all ears.
I'm going to have my KIbana instance installed on the master ES node,
which shouldn't be a big deal. I've played with the idea of putting the ES
servers on the syslog-ng servers and just have a separate NIC for the ES
traffic but didn't want to bog down the servers a whole lot.
Any thoughts or recommendations would be greatly appreciated.
Thanks,
Eric
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/eabe2830-f1bc-4e38-8d01-4cca1dad28b9%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.