ElasticSearch Delayed Indexing

I currently have the following setup:

syslog-ng servers --> Logstash --> ElasticSearch

The syslog-ng servers are load balanced and write to a SAN location where Logstash just tails the files and sends them to ES. I'm currently receiving around 1,300 events/sec to the syslog cluster for the networking logs. The issue I'm running into is a gradual delay in when the logs actually become searchable in ES. When I started the cluster (4 nodes), it was dead on. Then it gradually started falling behind. Yesterday it was ~2 hrs. behind and is now ~1 hr. behind after catching up a little bit last night.

I can confirm the logs are writing in real time on the syslog-ng servers and I can also confirm that my 4 other indexes that are using the same concept but a different Logstash instance are staying up-to-date. (i.e. I have 1 logstash instance for UNIX logs, Windows logs, Networking logs, etc.) However, they are significantly lower (~500 events/second).

I'm not sure if it's an issue on the ES or Logstash side. I would assume Logstash since ES is staying up-to-date on everything else. I've already split the networking file into 2 separate ones and spawned a new Logstash instance for each. I've also bumped up the worker nodes to 7. Both have helped, but still not able to keep up.

I'm running Logstash 2.1.1 and ES 2.1.1.

Any help would be greatly appreciated.

I'd start by upgrading to latest ES and LS.
But otherwise, what do your LS configs look like? Are you monitoring the load on the LS + ES hosts?

I actually upgraded to the latest Logstash instance yesterday because I wanted to use the "pipeline-batch-size" option. Once I upgraded that to 500, my load doubled on the server but I'm now staying up-to-date on the logs. So I believe that has fixed the issue for now.

To answer your question, the load on the LS servers was around 1.1 with 30% free RAM. The ES servers are fairly busy but nothing major since they are keeping up with all of the other log sources.

Below is my LS config for my networking logs. It's the most complex one out of all of the others.

file {
type => "network-syslog"
exclude => [".gz"]
start_position => "end"
path => [ "/mnt/logs/Networking/
.log"]
sincedb_path => "/etc/logstash/.sincedb-network"
}
}

filter {
grok {
overwrite => [ "message", "host" ]
patterns_dir => "/etc/logstash/logstash-2.1.1/vendor/bundle/jruby/1.9/gems/logstash-patterns-core-2.0.2/patterns"
match => [
"message", "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:host} %%{CISCOTAG:ciscotag}: %{GREEDYDATA:message}",
"message", "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:host} %{GREEDYDATA:message}",
"message", "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:host} %{IP:clientip} [%{HTTPDATE:timestamp}] %{IP:virtual_ip} %{DATA:virtual_name} %{DATA:virtual_pool_name} %{IPORHOST:server} %{NUMBER:server_port} %{SPACE} "%{DATA:path}" %{NUMBER:response:int} %{NUMBER:bytes:int} %{SPACE} %{NUMBER:response_ms:int} %{SPACE} %{QS:referrer} %{QS:agent}",
"message", "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:host}.+net.sas.com %{GREEDYDATA:message}"
]
}
grok {
match => [
"message", "%{CISCOFW106001}",
"message", "%{CISCOFW106006_106007_106010}",
"message", "%{CISCOFW106014}",
"message", "%{CISCOFW106015}",
"message", "%{CISCOFW106021}",
"message", "%{CISCOFW106023}",
"message", "%{CISCOFW106100}",
"message", "%{CISCOFW110002}",
"message", "%{CISCOFW302010}",
"message", "%{CISCOFW302013_302014_302015_302016}",
"message", "%{CISCOFW302020_302021}",
"message", "%{CISCOFW305011}",
"message", "%{CISCOFW313001_313004_313008}",
"message", "%{CISCOFW313005}",
"message", "%{CISCOFW402117}",
"message", "%{CISCOFW402119}",
"message", "%{CISCOFW419001}",
"message", "%{CISCOFW419002}",
"message", "%{CISCOFW500004}",
"message", "%{CISCOFW602303_602304}",
"message", "%{CISCOFW710001_710002_710003_710005_710006}",
"message", "%{CISCOFW713172}",
"message", "%{CISCOFW733100}",
"message", "%{GREEDYDATA}"
]
}
syslog_pri { }
date {
"match" => [ "syslog_timestamp", "MMM d HH:mm:ss",
"MMM dd HH:mm:ss" ]
target => "@timestamp"
}
mutate {
remove_field => [ "syslog_facility", "syslog_facility_code", "syslog_severity", "syslog_severity_code"]
}

output {
elasticsearch {
hosts => ["server1","server2","server3","server4"]
index => "network-%{+YYYY.MM.dd}"
template => "/etc/logstash/logstash-2.1.1/vendor/bundle/jruby/1.9/gems/logstash-output-elasticsearch-2.2.0-java/lib/logstash/outputs/elasticsearch/elasticsearch-network.json"
template_name => "network"
}
}