These are the things I've changed, but there hasn't been any significant improvement. I'm now running a single filebeat instance.
- Changed JVM heap size to 30GB
- Added another node to the ES cluster (2 physical servers), making it a 2-node cluster (total JVM heap is now 60GB)
- 2 shards per index
- Turned off metricbeat
- Changed index rollover size in ILM from 50GB to 40GB
Changing output.elasticsearch.worker
in filebeat.yml
from 8 to 12 or 16 did not improve the performance, so I left it at 8.
Some small improvements I've observed since the changes.
- Usually, the number of received packets in 30 seconds is about 8K (drop about 22K). Now, it's usually about 10K received packets, 20K dropped packets.
- When the index rollover size was 50GB, on the Indexing Rate chart on Kibana for the current ES index, the Indexing Rate fluctuates between 20 and 30K/s. After changing the size to 40GB, it fluctuates less, between 26 and 29K/s.
Not sure if this matters, but I also noticed that when the index rollover size was 50GB, the index's merges.total_throttled_time_in_millis
can be as high as 50% of merges.total_time_in_millis
. I got this by looking at the Stats of the index under Index Management. For example, I would see (for past indices)
"merges": {
"current": 0,
"current_docs": 0,
"current_size_in_bytes": 0,
"total": 178,
"total_time_in_millis": 5675026,
"total_docs": 62632665,
"total_size_in_bytes": 65673150397,
"total_stopped_time_in_millis": 13460,
"total_throttled_time_in_millis": 2712266,
"total_auto_throttle_in_bytes": 67806790
}
After changing the rollover size to 40GB, this has decreased to about 25-30%, e.g.
"merges": {
"current": 0,
"current_docs": 0,
"current_size_in_bytes": 0,
"total": 1639,
"total_time_in_millis": 10781764,
"total_docs": 158044730,
"total_size_in_bytes": 171873293917,
"total_stopped_time_in_millis": 10079,
"total_throttled_time_in_millis": 3052129,
"total_auto_throttle_in_bytes": 132788866
}
The number of merges seems to have increased, though.
Something else I observed:
As I mentioned previously, about every 3 minutes, I get zero packet drop and all packets are received for about 2 min, then the packets start dropping again. This usually occurs when memstats.gc_next
and memstats.memory_alloc
are the lowest compared to a few minutes before and after.
The indexing rate drops to 0 at around the same time the packet drop decreases to 0 (about 30K packets received at this time). Indexing rate stays at 0 for about 20-40 seconds before increasing, while packet drop remains at 0 for about 2 more min.
What else can I try?