Question about sizing

tgdesrochers · February 29, 2016, 7:23pm

I am using ES in production but I am getting a log of 429 errors when logstash attempts to put data into elasticsearch.

My current set is is as follows:
4 logstash nodes pulling data off a redis cluster. These nodes perform some mutates, GEO IP, translates, and some other basic things. the output section from them looks like:

output {
##########    BRO Outputs -> ES Cluster    ##########
  if [type] == "BRO" {
    if [sensor] == "host6" {
      elasticsearch {
        hosts => [[ "x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200" ]
        manage_template => false
        index => "data-5-%{+YYYY.MM.dd}"
      }
    }
    if [sensor] == "host4" {
      elasticsearch {
        hosts => [ "x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200" ]
        manage_template => false
        index => "data-3-%{+YYYY.MM.dd}"
      }
    }
  }

########    SiLK Outputs -> ES Cluster   #########
  if [type] == "silk" {
    if [sensor_id] in [3,4] {
      elasticsearch {
        hosts =>[ "x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200" ]
        index => "data-1-%{+YYYY.MM.dd}"
      }
    }
    if [sensor_id] in [1,2] {
      elasticsearch {
        hosts => [ "x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200" ]
        index => "data-2-%{+YYYY.MM.dd}"
      }
    }
  }

##########    Topbeat Outputs -> ES Cluster   #########
  if "topbeat" in [tags] {
    if [host] == "host1" {
      elasticsearch {
        hosts => [ "x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200" ]
        index => "topbeat-%{+YYYY.MM.dd}"
      }
    }
    if [host] == "host2" {
      elasticsearch {
        hosts => [ "x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200" ]
        index => "topbeat-%{+YYYY.MM.dd}"
      }
    }
  }

##########    Sensor Metric Outputs -> ES Cluster   ##########
  if "sensor-metrics" in [tags] {
    elasticsearch {
      hosts => [ "x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200" ]
      index => "sensor-metrics-%{+YYYY.MM.dd}"
    }
  }


##########    Redis Metrics -> Mgt ES Cluster    ##########
  if "redis-metrics" in [tags] {
    elasticsearch {
      hosts => [ "x.x.x.x:9200","x.x.x.x:9200" ]
      index => "cluster-metrics-%{+YYYY.MM.dd}"
    }
  }


###########    Indexer Metrics -> Mgt ES Cluster   ##########
  if "ls-indexer-metrics" in [tags] {
    elasticsearch {
      hosts => [ "x.x.x.x:9200","x.x.x.x:9200" ]
      index => "cluster-metrics-%{+YYYY.MM.dd}"
    }
  }
}

As you can see from the elasticsearch outputs this goes into a 10 node cluster. All the elasticsearch nodes, have 1TB HDD's 16 Core processors, and 32 GB ram. My metrics filter is reporting my sensors are sending 3000 records per second into the redis cluster and the indexers are sending the same into the cluster. From reading the docs I wouldn't think that this amount of data would cause issues at ingestion and 429 errors, but in my situation they are. Besides scaling out (which I can do but I don't thing I am sending to much data per second into the cluster). What can be done to try to do away with the error.

Logstash is version 2.2.2
Elasticsearch is version 2.2.0

My config is:
node.master: false
node.data: true
index.number_of_shards:46 (I am doing this so I can grow to a 45 node cluster at the full scale)
index.number_of_replicas: 3
bootstrap.mlockall: true
discover.zen.ping.unicast.hosts: [shared A record] (I have 3 masters on a single A record)
marvel.agent.exporters: Marvel stuff

If it is my shards causing this what can I do to assure I can scale efficiently? I was told I should have n+1 shards with n being the number of elasticsearch nodes I will have in my cluster.

Thanks for the help

bleskes · February 29, 2016, 7:37pm

Error code 429 signals that the cluster is overloaded and is not able to keep with the incoming requests. In your case I suspect it has to do with the high number of shards (46) and replicas 3, totalling 184 shards. With only 10 nodes to carry this, I suspect they are just overloaded. This should be clear by either CPU or IO utilzation.

since you are using logstash, I suspect you are also using daily indices (or simiar) - which means you don't need to start with so many shards. You can change your number shards on a daily basis by adding an index template. This chapter is a good read for this use case: https://www.elastic.co/guide/en/elasticsearch/guide/master/time-based.html#index-per-timeframe

tgdesrochers · February 29, 2016, 8:09pm

Great, I will make the changes ASAP. Question? - Can I change it in the elasticsearch.yml or do I need to do it in an index template.

Thanks

bleskes · February 29, 2016, 8:21pm

we don't recommend having these settings in the yml file. Better use a template, which will make sure it is the same on all nodes. Note though that a template is only used to create new indices. Existing indices are not affected by it...

tgdesrochers · February 29, 2016, 8:23pm

perfect. Thanks. I will comment out the section in the yml file and update my template.

Thanks for the assist. I appreciate it a lot

Topic		Replies	Views
Possible ES bottleneck in redis-logstash-elasticsearch system Elasticsearch	2	546	July 6, 2017
How to tune Logstash to Elasticsearch shipping Logstash	5	9559	July 6, 2017
Getting 429 errors in Elasticsearch output Logstash	1	1092	March 1, 2017
Elasticsearch Output - Request size exceeded error Logstash	9	4830	July 6, 2017
Incorrect information in logs Elasticsearch	2	505	September 21, 2018

Question about sizing

Related topics