How to gain ingestion rate

Marcin_Kubica · May 8, 2016, 7:22am

Hi, trying to pass 10k doc/s on a 9 node 8c/32g ram cluster

Dataset in totall can reach 400GB

Started small with 6 shards no replicas and can't reach above 10k docs/s

What am I missing? Should I allocate a shard per core? Long index refresh aslo doesnt seem to change anything.

Cheers
Marcin

warkolm · May 8, 2016, 7:36am

Are you using bulk?
What client are you using for ingestion?

Christian_Dahlqvist · May 8, 2016, 8:32am

What is the average size of your documents? Which version of Elasticsearch are you using? Are you using time-based indices?

Marcin_Kubica · May 8, 2016, 8:41am

Hi Mark

Bulk and with logstash.

Cheers
Marcin

Marcin_Kubica · May 8, 2016, 8:59am

Hi Chris

It's an Apache style log, like your usual http server log with URL,
response code and few tags. We use @timestamp as usual.

Marcin

Christian_Dahlqvist · May 8, 2016, 9:33am

For such small documents that sounds very low given the size of the cluster. How many Logstash instances do have indexing into the cluster? How are these configured?

Marcin_Kubica · May 9, 2016, 1:50am

Hi Christian

I've taken a step back and graphed logstash instances and it seems it's simply Redis which is limiting currently.

We seem to be getting say around 1.5k average per logstash instance. However seems just reached Redis limit. Looking into that now.

Christian_Dahlqvist · May 9, 2016, 5:06am

What does your logstash configuration look like?

Marcin_Kubica · May 9, 2016, 5:41am

Hi Christian

Usually like this https://github.com/mutl3y/logstash/blob/master/roles/logstash/templates/lds_logs.conf.j2

Christian_Dahlqvist · May 9, 2016, 8:22am

Why are you using a ruby filter to calculate MD5 hash of the message field when the fingerprint filter plugin exists for this purpose? I would expect this to be more efficient than to invoke the ruby script for each event, but have not benchmarked it.

Marcin_Kubica · May 9, 2016, 9:56am

Will try that fingerprint but found other clue.

If I play 4 processors and reach about 10k ingest, if I'm adding more it has no effect.

Also, when the above four are in play and I spawn 2 totally different ingesters with stdin input again all of them seem to kneel down at 10k. The different two I mention each can play at 2k.

So clearly not the each single logstash node problem.

Currently using 16 elastic nodes ( 9 spec as mentioned ) and 7 other a little less spec.

Any idea what's the cause? Almost like elastic would have ingest rate hard limit to 10k on cluster. Is that the case?

Many thanks
Marcin

Christian_Dahlqvist · May 9, 2016, 10:16am

Which version of Logstash are you using? How many workers do you have configured?

In order to find out where the bottleneck is I would recommend starting from Elasticsearch and work up the pipeline. Set up a simple Logstash config that reads from a file and index into a test index. See if you are able to push Elasticsearch harder and achieve a better indexing rate this way. If this is the case, feed Logstash from a file (in order to rule out Redis dequeueing) and see what it is able to do. Continue up the pipeline while optimising settings until you find the bottleneck.

Marcin_Kubica · May 9, 2016, 10:23am

Thanks, would you advise single logstash instance per file to ingest and with single shard? And maybe route each index per node? And then just multiplying that? Or maybe other way?

Many thanks again
Marcin

Christian_Dahlqvist · May 9, 2016, 10:27am

I would set up a test index with as many shards as there are data nodes and add as many Logstash nodes as needed until you have been able to verify whether Elasticsearch is the bottleneck or not.

Marcin_Kubica · May 9, 2016, 10:28am

Thanks!

Topic		Replies	Views
Improving Elasticsearach ingest capacity Elasticsearch	7	120	June 20, 2024
Ingestion rate in a elastic cluster Elasticsearch	2	3905	September 14, 2017
Ingestion performance issues - where to start? Elasticsearch	6	675	September 18, 2020
Optimizing configuration for ingestion Elasticsearch	1	437	December 19, 2017
Slow ingestion problem (v 6.2.3) Elasticsearch	14	3659	July 22, 2018

How to gain ingestion rate

Related topics