I'd like to revisit this for a moment.
This is my current output for my log data from my LS indexers (4 of them) into my 10 node cluster. Each node has 32 GB of ram and plenty of HDD space.
output {
if [type] == "BRO" {
if [sensor1] == "sensor" {
elasticsearch {
hosts => [ "es-op-01-cl-55" ]
manage_template => false
index => "sensor1-bro-%{+YYYY.MM.dd}"
}
}
}
}
Question: the referenced domain name has all 10 ES nodes assigned to it. My thought was that LS would treat them as an array, send data to each as it needed, but I'm wondering if it is working that way. Is it better to list out the IP's of each node rather than have them in a domain?
I ask because it appears that my LS nodes are getting overwhelmed, I get the following error on the indexer nodes quite often and constant:
{:timestamp=>"2015-11-19T06:33:32.651000-0500", :message=>"retrying failed action with response code: 429", :level=>:warn}
My ES nodes get the following error:
[2015-11-19 06:34:02,734][DEBUG][action.bulk ] [WORKER_NODE_1] [sensor1-bro-2015.11.15][29] failed to execute bulk item (index) index {[sensor1-bro-2015.11.15][BRO][AVEfhVzWtyozxrGteR4H], source[{"ts":"2015-11-15T21:02:02.785421Z","uid":"CPztEE1K5ieUuKMpg","trans_depth":2,"method":"GET","host":"au.v4.download.windowsupdate.com","uri":"/msdownload/update/driver/drvs/2012/12/20288081_f78ee4b887c9c6047312311a6ffcb861dec5154e.cab","user_agent":"Microsoft BITS/7.8","request_body_len":0,"response_body_len":5630,"status_code":206,"status_msg":"Partial Content","tags":[],"resp_fuids":["FvVBtp1BvGH71O9wR4"],"resp_mime_types":["application/vnd.ms-cab-compressed"],"@version":"1","@timestamp":"2015-11-15T21:02:02.785Z","path":"/nsm/bro/logs/current/http_eth2.log","type":"BRO","csp_sensor":"cobalt","log_path":"http_eth2","src_ip":"192.168.10.143","src_port":49508,"resp_ip":"23.62.7.146","resp_port":80,"geoip_resp":{"ip":"23.62.7.146","country_code2":"US","country_code3":"USA","country_name":"United States","continent_code":"NA","region_name":"MA","city_name":"Cambridge","postal_code":"02142","latitude":42.362599999999986,"longitude":-71.0843,"dma_code":506,"area_code":617,"timezone":"America/New_York","real_region_name":"Massachusetts","location":[-71.0843,42.362599999999986]},"resp_Senderbase_lookup":"http://www.senderbase.org/lookup/?search_string=23.62.7.146","resp_CBL_lookup":"http://cbl.abuseat.org/lookup.cgi?ip=23.62.7.146","resp_Spamhaus_lookup":"http://www.spamhaus.org/query/bl?ip=23.62.7.146","resp_DomainTools_lookup":"http://whois.domaintools.com/23.62.7.146","url_full":"au.v4.download.windowsupdate.com/msdownload/update/driver/drvs/2012/12/20288081_f78ee4b887c9c6047312311a6ffcb861dec5154e.cab"}]}
The reason I wanted to use the domain with multiple IP's assigned to it was to make it easier to add nodes for scaling. By using the domain I'd be able to add a node to the domain without having to change any configs on the nodes themselves.
Also, my input looks like this:
input {
redis {
host => [ "HOST1" ]
data_type => "list"
key => "bro"
}
repeated 14 times for a 14 node redis cluster. This config is on each of my 4 LS indexers. My thought was that I'd have 4 LS indexers accepting connections from all my redis nodes at all times to take messages as they become available. They would then do their filtering and pass the data to the ES cluster. I am not pushing much data at this time so overwhelming the LS or ES nodes seemed unlikely but I am getting these errors.
Should I:
- Add more indexers?
- Split my redis inputs among my indexers so each indexer accepts connections from a subset of redis nodes?
- list out my hosts in the output section of my elasticsearch output?
- all of the above
- get a new job where I don't have to deal with this daily?
Thanks