Very slow filebeat/elasic cloud throughput

Kieren_Johnstone · August 3, 2018, 7:18am

Here's my cluster, hosting on elastic.co:

My index activity:

My filebeat monitoring graphs:

And my filebeat console output stats:

2018-08-03T07:13:28.751Z INFO [monitoring] log/log.go:124 Non-zero metrics in the last 30s {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":300,"time":302},"total":{"ticks":1520,"time":1525,"value":1520},"user":{"ticks":1220,"time":1223}},"info":{"ephemeral_id":"4dc13585-4910-4d25-822d-94645167e2d5","uptime":{"ms":600010}},"memstats":{"gc_next":35961888,"memory_alloc":25477304,"memory_total":141147488}},"filebeat":{"events":{"active":2,"added":202,"done":200},"harvester":{"open_files":10,"running":10,"started":1}},"libbeat":{"config":{"module":{"running":0}},"output":{"events":{"acked":200,"batches":4,"total":200},"read":{"bytes":3459},"write":{"bytes":128119}},"pipeline":{"clients":1,"events":{"active":4117,"filtered":1,"published":200,"total":201},"queue":{"acked":200}}},"registrar":{"states":{"current":2,"update":200},"writes":4},"system":{"load":{"1":3.22,"15":1.6,"5":1.99,"norm":{"1":0.805,"15":0.4,"5":0.4975}}},"xpack":{"monitoring":{"pipeline":{"events":{"published":3,"total":3},"queue":{"acked":3}}}}}}}

As you can see there, it seems to be running/processing 10 files (there's one per minute). It gets way behind on events.

There doesn't seem to be any substantial load on the system at all.

Can anyone advise? Any more info I can give to help diagnose?

Bonus question: what is the "system load" stat - it's presumably not CPU core time, since the Pod in kubernetes that filebeat shares with the actual workload only peaks at ~1.13 cores?

Kieren_Johnstone · August 3, 2018, 8:33am

So it looks like the default filebeat config with the "cloud.id" and "cloud.auth" (no explicit output.elasticsearch section) yields terrible performance, 10/sec. As per the sample config I found..

What's an actually-good config for this?

kvch · August 3, 2018, 8:51am

Could you please share your configuration formatted using </>?

System load is the load average of your host. system.load.n is the average load in the last n minutes. As it's a quad-core host, to get the loan on each core you need to look at system.load.norm.n metrics. So in the last 1 minute the load was 0.805 on a core, in the last 5 minutes 0.4975, etc.

{"system":
    {"load":
        {"1": 3.22,
         "15": 1.6,
         "5": 1.99,
         "norm":
            {"1": 0.805,
             "15":0.4,
             "5":0.4975}}},

Christian_Dahlqvist · August 3, 2018, 10:09am

You have small Elastic Cloud cluster, so it is important to use the resources available as efficiently as possible. I would recommend the following:

You have far too many shards for a cluster that size. Change to a single primary shard per index and also consider having each index covering a longer time period, e.g. a month to get the average shard size up and the shard count down. Read this blog post for more details. Reducing the number of shards you are actively indexing into can also help reduce the amount of bulk rejections you might be seeing, which can improve performance as Beats will need to retry less.
Having lots of Beats with low volumes write directly into Elasticsearch can result in very small bulk requests, which can be inefficient. Instead try to increase the batch size Beats write in order to improve performance. This blog post provides a good discussion and example. This can lead to it taking longer for documents to reach Elasticsearch. If this is not desireable you can send Beats data through Logstash, which will allow you to tune the batch size across all Beats.
It also looks like you have multiple versions of Filebeat writing to separate version-specific indices. Update all Filebeats to the same version to get fewer shards to index into.

Kieren_Johnstone · August 3, 2018, 12:17pm

Thanks Christian, very much appreciated and I'll tweak our setup.

I just went with the defaults and configs that I found on the official site. Is there some config/setup guide I should have stumbled across? It would be a shame if my situation (running into terrible performance problems and posting here) was the official route!

Kieren_Johnstone · August 3, 2018, 12:30pm

Sorry to vent, but cloud.elastic.io has just been such a pain. The cluster isn't showing any data logged. CPU use is low, mem use low. Forced a restart minutes ago. Support don't respond for hours, by which time it's fixed itself. Getting set up was a pain, default configurations are a pain. I don't know who it's supposed to be aimed at - I guess not me?

Kieren_Johnstone · August 3, 2018, 12:48pm

If anyone can help... I've been waiting for a Force Restart on my single-node deployment for around an hour..

Kieren_Johnstone · August 3, 2018, 3:26pm

3 hours now...

system · August 31, 2018, 3:26pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Insufficient throughput from Filebeat Beats filebeat	18	10577	July 5, 2017
Filebeat poor performance when publishing existing file to ES Beats filebeat	8	2939	March 7, 2017
Filebeat 6.2 throughput and general performance Beats filebeat	7	4562	April 3, 2018
How to improve Filebeat -> Elastic performance & reduce Elastic store size ? got ~9433 logs per second Beats filebeat	5	3527	August 29, 2017
Increase performance for a PubSub module in Filebeat Beats filebeat	5	976	January 7, 2020

Very slow filebeat/elasic cloud throughput

Related topics