Any ideas what could be the cause for these errors?
ElasticSearch has 16GB of RAM available out of 32 on the system, IO is pretty small, cpu load on VM is also pretty low (<0.5)...
Also, I'm not seeing all the logs I'm expecting to see in the kibana, and don't know if this could be the cause for that, or maybe I have issues on the agent side.
I'm running these versions:
ElasticSearch 1.5.2
LogStash: 2.0.0 (was running 1.5.x with same problems)
I would remove these 4 lines and see what happens with the defaults.
16 workers is generally far more than necessary. I wouldn't make this setting more than 2 unless you're doing over 10,000 events per second. Flush size is also too big. Because of the retry logic (which is why you get 429 response codes), you should probably work in smaller batches (I believe the default is 512 now). The plain codec simply doesn't do anything here, as elasticsearch requires JSON.
I also note that you are separating your indices by app. How many "apps" do you have per day? How many indices do you have, total, on your cluster? What's your data retention policy? Are you using the default 5+1 shard count? I ask these questions because having too many shards on a single node can overload a node's index management ability. It only gets to use a percentage of the heap for this, and exhausting the memory creates pressure which can dramatically affect index caching (which might be what's resulting in more 429s).
720 shards is quite a few for a single node, as is 5 shards per index on a single node. For a single node, I would suggest that you only need 1 shard, maybe 2.
How many "apps" do you have per day (i.e., how many indices are created each day)? This number will have a profound impact as it dictates the number of "active" shards, which want more of the index cache.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.