How to determine the bottleneck between Filebeat and ES?

hjazz6 · May 12, 2023, 9:03am

Hi,

I'm trying to determine the bottleneck for my Netflow setup, to see if I can further optimize the performance.

I am ingesting Netflow traffic into a Linux server running both filebeat and elasticsearch 7.1.4. I'm using the Netflow module that came with filebeat to write to ES.

I have made some performance tuning to increase the indexing rate, from ~1.5K/s using default values to ~13.5K/s now. But my Netflow ingest is still at a much higher rate, and I hope to be able to increase the indexing rate further. Problem is, I don't know if the bottleneck is at filebeat or ES.

I have configured the following settings to increase the indexing rate:

In filebeat.yml:

- output.elasticsearch.bulk_max_size: 4000
- output.elasticsearch.worker: 8
- queue.mem.events: 64000
- queue.mem.flush.min_events: 4000

In netflow.yml:
- queue_size: 64000

My server has the following specs:

125GB RAM (64GB ringfenced for ES's JVM heap)
48 CPUs
9.6TB HDD total

Current utilization rate (obtained from the Stack Monitoring dashboard for ES node):

CPU: 15-25%
Memory usage (JVM Heap): Fluctuates between 6GB and 45GB (10-70%)
I/O operations rate: typically around 120/s, can occasionally spike to 200/s
System load: 10-18
Disk available: 1.1TB/7.3TB (I set a ILM policy so that the disk available doesn't fall below 15%)

I currently have 196 indices/primary shards, and 0 replica shards, containing 8.4B documents. Each index will rollover at ~50GB.

I think my filebeat is dropping packets, because I always see the following in the filebeat output (I'm running it as filebeat -e):
"input":{"netflow":{"flows":454193,"packets":{"dropped":1622814,"received":5499}}}

I also don't know if this helps (from running GET /_nodes/stats):

gc.collectors.young.collection_count: 132997
gc.collectors.young.collection_time_in_millis: 3530938
gc.collectors.old.collection_count: 0
gc.collectors.old.collection_time_in_millis: 0

What can I do to determine where the bottleneck is?

Thank you.

Christian_Dahlqvist · May 12, 2023, 9:13am

For high ingest throughput SSDs are recommended. Storage performance could therefore be the bottleneck. Run iostat -x and check await and disk utilisation to see if this might be the case.

How many primary shards are you actively indexing into?

hjazz6 · May 12, 2023, 2:48pm

Usually only one at a time, or 2 when it is about to rollover to another index.

system · June 9, 2023, 2:48pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Filebeat to Elasticsearch log shipping is very slow Elasticsearch	9	2107	July 23, 2018
Performanceissue with Filebeat and Netflow Input Beats beats-module , filebeat	7	2078	November 3, 2021
Ingest Rate in Elasticsearch is Slow Elasticsearch	10	368	March 13, 2024
Netflow gaps between filebeat and elasticsearch Beats filebeat	1	368	April 15, 2020
Tuning the ES performance Elasticsearch	36	3767	December 6, 2017

How to determine the bottleneck between Filebeat and ES?

Related topics