Question about sizing


(Tim Desrochers) #1

I am using ES in production but I am getting a log of 429 errors when logstash attempts to put data into elasticsearch.

My current set is is as follows:
4 logstash nodes pulling data off a redis cluster. These nodes perform some mutates, GEO IP, translates, and some other basic things. the output section from them looks like:

output {
##########    BRO Outputs -> ES Cluster    ##########
  if [type] == "BRO" {
    if [sensor] == "host6" {
      elasticsearch {
        hosts => [[ "x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200" ]
        manage_template => false
        index => "data-5-%{+YYYY.MM.dd}"
      }
    }
    if [sensor] == "host4" {
      elasticsearch {
        hosts => [ "x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200" ]
        manage_template => false
        index => "data-3-%{+YYYY.MM.dd}"
      }
    }
  }

########    SiLK Outputs -> ES Cluster   #########
  if [type] == "silk" {
    if [sensor_id] in [3,4] {
      elasticsearch {
        hosts =>[ "x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200" ]
        index => "data-1-%{+YYYY.MM.dd}"
      }
    }
    if [sensor_id] in [1,2] {
      elasticsearch {
        hosts => [ "x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200" ]
        index => "data-2-%{+YYYY.MM.dd}"
      }
    }
  }

##########    Topbeat Outputs -> ES Cluster   #########
  if "topbeat" in [tags] {
    if [host] == "host1" {
      elasticsearch {
        hosts => [ "x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200" ]
        index => "topbeat-%{+YYYY.MM.dd}"
      }
    }
    if [host] == "host2" {
      elasticsearch {
        hosts => [ "x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200" ]
        index => "topbeat-%{+YYYY.MM.dd}"
      }
    }
  }

##########    Sensor Metric Outputs -> ES Cluster   ##########
  if "sensor-metrics" in [tags] {
    elasticsearch {
      hosts => [ "x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200","x.x.x.x:9200" ]
      index => "sensor-metrics-%{+YYYY.MM.dd}"
    }
  }


##########    Redis Metrics -> Mgt ES Cluster    ##########
  if "redis-metrics" in [tags] {
    elasticsearch {
      hosts => [ "x.x.x.x:9200","x.x.x.x:9200" ]
      index => "cluster-metrics-%{+YYYY.MM.dd}"
    }
  }


###########    Indexer Metrics -> Mgt ES Cluster   ##########
  if "ls-indexer-metrics" in [tags] {
    elasticsearch {
      hosts => [ "x.x.x.x:9200","x.x.x.x:9200" ]
      index => "cluster-metrics-%{+YYYY.MM.dd}"
    }
  }
}

As you can see from the elasticsearch outputs this goes into a 10 node cluster. All the elasticsearch nodes, have 1TB HDD's 16 Core processors, and 32 GB ram. My metrics filter is reporting my sensors are sending 3000 records per second into the redis cluster and the indexers are sending the same into the cluster. From reading the docs I wouldn't think that this amount of data would cause issues at ingestion and 429 errors, but in my situation they are. Besides scaling out (which I can do but I don't thing I am sending to much data per second into the cluster). What can be done to try to do away with the error.

Logstash is version 2.2.2
Elasticsearch is version 2.2.0

My config is:
node.master: false
node.data: true
index.number_of_shards:46 (I am doing this so I can grow to a 45 node cluster at the full scale)
index.number_of_replicas: 3
bootstrap.mlockall: true
discover.zen.ping.unicast.hosts: [shared A record] (I have 3 masters on a single A record)
marvel.agent.exporters: Marvel stuff

If it is my shards causing this what can I do to assure I can scale efficiently? I was told I should have n+1 shards with n being the number of elasticsearch nodes I will have in my cluster.

Thanks for the help


(Boaz Leskes) #2

Error code 429 signals that the cluster is overloaded and is not able to keep with the incoming requests. In your case I suspect it has to do with the high number of shards (46) and replicas 3, totalling 184 shards. With only 10 nodes to carry this, I suspect they are just overloaded. This should be clear by either CPU or IO utilzation.

since you are using logstash, I suspect you are also using daily indices (or simiar) - which means you don't need to start with so many shards. You can change your number shards on a daily basis by adding an index template. This chapter is a good read for this use case: https://www.elastic.co/guide/en/elasticsearch/guide/master/time-based.html#index-per-timeframe


(Tim Desrochers) #3

Great, I will make the changes ASAP. Question? - Can I change it in the elasticsearch.yml or do I need to do it in an index template.

Thanks


(Boaz Leskes) #4

we don't recommend having these settings in the yml file. Better use a template, which will make sure it is the same on all nodes. Note though that a template is only used to create new indices. Existing indices are not affected by it...


(Tim Desrochers) #5

perfect. Thanks. I will comment out the section in the yml file and update my template.

Thanks for the assist. I appreciate it a lot


(system) #6