Hey guys,
I have a lot of servers throwing data to ES and I need to improve its performance. From what I understand, I need to add a new data node?
The _cluster/health active shards is slowly dropping and there are 80 + unassigned shards now. Its only a matter of time until it dies.
So am I correct to add a new data node to fix this? here is my cluster health:
{
"cluster_name" : "eTech_cluster",
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 1,
"number_of_data_nodes" : 1,
"active_primary_shards" : 5161,
"active_shards" : 5161,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 91,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 98.26732673267327
}
Adding another data node is a good idea as a temporary relief but even with two nodes you have way way way too many shards. I suspect you can reduce the total number of shards by combining indexes and/or reducing the number of shards per index but without more information it's impossible to give specific advice. It's possible that you after such an operation will do fine with a single node again.
We have roughly 200 servers that will be sending winlogbeat logs to ES. What info would you need to get better advice? Also, is ES capable of support 200 windows servers sending logs?
The number of servers sending logs is irrelevant. What matters is the total number of events, how they're distributed over time, and if there are reasons to use separate index series.
I'm interested in what kind of indexes you have and how come you have over 5000 shards. Do you have a single index series (e.g. logstash-YYYY.MM.DD) or multiple? How many shards per index? What's the typical size (in bytes) of an index? How many days do you keep indexes?
http://192.168.60.90:9200/_cat/indices?v
Produces hundreds/possibly thousands of these:
health status index pri rep docs.count docs.deleted store.size pri.store.size
green open winlogbeat-2014.10.29 5 0 34 0 79.4kb 79.4kb
green open winlogbeat-2014.10.26 5 0 57 0 98.4kb 98.4kb
green open winlogbeat-2014.10.25 5 0 265 0 381.5kb 381.5kb
green open winlogbeat-2014.10.28 5 0 38 0 103.4kb 103.4kb
green open winlogbeat-2014.10.27 5 0 52 0 127.9kb 127.9kb
green open winlogbeat-2014.10.22 5 0 42 0 88.4kb 88.4kb
green open winlogbeat-2014.10.21 5 0 34 0 100.2kb 100.2kb
green open winlogbeat-2014.10.24 5 0 74 0 160.5kb 160.5kb
green open winlogbeat-2014.10.23 5 0 46 0 124.5kb 124.5kb
green open winlogbeat-2014.10.20 5 0 37 0 80.4kb 80.4kb
green open winlogbeat-2014.09.30 5 0 45 0 114kb 114kb
green open winlogbeat-2015.04.09 5 0 356 0 390kb 390kb
green open winlogbeat-2015.04.08 5 0 476 0 264.5kb 264.5kb
green open winlogbeat-2015.04.07 5 0 334 0 284.5kb 284.5kb
green open winlogbeat-2015.04.06 5 0 195 0 264.6kb 264.6kb
green open winlogbeat-2014.10.19 5 0 35 0 108.3kb 108.3kb
green open winlogbeat-2015.04.05 5 0 194 0 293.3kb 293.3kb
green open winlogbeat-2014.10.18 5 0 38 0 90.8kb 90.8kb
green open winlogbeat-2015.04.04 5 0 216 0 297.9kb 297.9kb
green open winlogbeat-2015.04.03 5 0 202 0 256.6kb 256.6kb
green open winlogbeat-2014.10.15 5 0 75 0 153.3kb 153.3kb
green open winlogbeat-2014.10.14 5 0 38 0 87.8kb 87.8kb
green open winlogbeat-2014.10.17 5 0 56 0 134.2kb 134.2kb
Is this what you was asking?
Okay, so you have at least one daily index series (winlogbeat-YYYY.MM.DD) with five shards and you've been going at it for a couple of years. Do you have more index series than winlogbeat-xxx? Two years of winlogbeat-xxx indexes with five shards a day still doesn't add up to more than 3650 shards.
The first and most important step would be to cut down the number of shards per index to one. Regardless of whether you're sending directly to ES from Winlogbeat or if you're using Logstash the key is to modify the index template used. I also suggest that you reduce the number of indexes by using monthly indexes instead of daily, but I'm not sure that's configurable if Winlogbeat sends directly to ES.
Sorry for the long reply, I've been busy.
We've had the ES system up for about a month, I have no idea why there are shards with the 2014 date on it. Could this be because a server or two have the wrong system date? causing it to print 2014?
And the current system is winlog straight into ES, would logstash help with performance? the template file I'm using is completely default, are there any changes I should be making to it?
When I run this command:
PUT /winlogbeat-*/_settings
{
"settings": {
"number_of_shards" : 1,
"number_of_replicas" : 0
}
}
I get this return:
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "can't change the number of shards for an index"
}
],
"type": "illegal_argument_exception",
"reason": "can't change the number of shards for an index"
},
"status": 400
}
However, I did at this line to the template file: "number_of_shards": 1
If I was to update the template file on all the servers runnning winlog, would this achieve the same goal?
Edit: removing the number of shards line and leaving number of replicas works and returns "acknowledged: true"