Issues with logstash sustained throughput

Ryan_Bellows · April 10, 2014, 7:12pm

Hi All,
Since upgrading to 1.4 (and ES 1.1) I've had no luck at all consuming logs,
with events averaging about 10k a second, peak around 15k. What happens is
Logstash processing cruises right along for a short while, then (without
any errors or otherwise obvious reasons) the redis input queue starts to
back up. Eventually redis will use up to it's memory limit and the shippers
stop sending logs. Once the queue is full Logstash cant ever catch up, and
I have to wipe out the queue and start over. Even with zero shippers
sending logs to redis, I can watch the queue length and see that Logstash,
when it gets to this state, is only handling a few thousand events every
few seconds, extremely slow.

The underlying infrastructure is pretty robust, here are the specs:

Indexer1: 2x6core cpus, 24gb memory
Indexer2: 2x8core cpus, 64gb memory

ES cluster:

3 x 64gb machines with 2x8core cpus, 64gb memory, 31gb for ES heap, all SSD
drives.

The ES machines should easily be able to handle what I'm throwing at them,
and load/iops/etc are well within limits at all times. So I don't think the
problem is ES.

I've got the 2 logstash indexer machines pulling from two different redis
queues, one on each host. Indexer2 handles just a single log which is high
volume. Indexer1 handles lower volume but several logs and more Logstash
processing. I was thinking of switching them up to throw more CPU at the
heavier-processing logstash instance, but honestly I doubt it would help
with the backups/througput issues as graphs from both machines show they
are not stressed for CPU at all. In fact load is pretty low, Logstash just
isn't using the cpus much once it gets backed up.

Attached are my two indexer configs. I've tried bumping redis input_threads
up but this has little effect, same with batch_size. Both indexers are
started with -w (number of cores -1), and have 8gb heap configured for
Logstash.

host os:

CentOS 6.2 with latest kernel
Sun JRE 1.7.0_45
Logstash 1.4.0-1_c82dc09
ES 1.1.0

Here are some graphs showing the redis queue:
https://lh4.googleusercontent.com/-t6wTedvMHLY/U0bq7nejaqI/AAAAAAAAACo/NbAjNnHuBAI/s1600/indexer1.png https://lh5.googleusercontent.com/-L-du00g5mAs/U0bq-GtSFrI/AAAAAAAAACw/LJEbtGXnQ34/s1600/indexer2.png

And CPU usage during the same timeframe:
https://lh5.googleusercontent.com/-EHhbNorLiMY/U0bsqBGSE6I/AAAAAAAAADM/f21iu_46e_M/s1600/indexer2_cpu.png https://lh3.googleusercontent.com/-e4TAQXmAdK4/U0bsZuA2Z3I/AAAAAAAAAC8/u3G7BzNkGWE/s1600/indexer1_cpu.png

Attached are my two indexer configs.

Really frustrated by this issue, I've tried everything I can think of but
the end result is always the same. Any help or thoughts would be greatly
appreciated.

-Ryan

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f100879e-7769-4747-ba9d-83265a1de996%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ryan_Bellows · April 10, 2014, 7:13pm

annnnnd sent this to the wrong group. Sorry!

On Thursday, April 10, 2014 12:12:07 PM UTC-7, Ryan Bellows wrote:

Hi All,
Since upgrading to 1.4 (and ES 1.1) I've had no luck at all consuming
logs, with events averaging about 10k a second, peak around 15k. What
happens is Logstash processing cruises right along for a short while, then
(without any errors or otherwise obvious reasons) the redis input queue
starts to back up. Eventually redis will use up to it's memory limit and
the shippers stop sending logs. Once the queue is full Logstash cant ever
catch up, and I have to wipe out the queue and start over. Even with zero
shippers sending logs to redis, I can watch the queue length and see that
Logstash, when it gets to this state, is only handling a few thousand
events every few seconds, extremely slow.

The underlying infrastructure is pretty robust, here are the specs:

Indexer1: 2x6core cpus, 24gb memory
Indexer2: 2x8core cpus, 64gb memory

ES cluster:

3 x 64gb machines with 2x8core cpus, 64gb memory, 31gb for ES heap, all
SSD drives.

The ES machines should easily be able to handle what I'm throwing at them,
and load/iops/etc are well within limits at all times. So I don't think the
problem is ES.

I've got the 2 logstash indexer machines pulling from two different redis
queues, one on each host. Indexer2 handles just a single log which is high
volume. Indexer1 handles lower volume but several logs and more Logstash
processing. I was thinking of switching them up to throw more CPU at the
heavier-processing logstash instance, but honestly I doubt it would help
with the backups/througput issues as graphs from both machines show they
are not stressed for CPU at all. In fact load is pretty low, Logstash just
isn't using the cpus much once it gets backed up.

Attached are my two indexer configs. I've tried bumping redis
input_threads up but this has little effect, same with batch_size. Both
indexers are started with -w (number of cores -1), and have 8gb heap
configured for Logstash.

host os:

CentOS 6.2 with latest kernel
Sun JRE 1.7.0_45
Logstash 1.4.0-1_c82dc09
ES 1.1.0

Here are some graphs showing the redis queue:

https://lh4.googleusercontent.com/-t6wTedvMHLY/U0bq7nejaqI/AAAAAAAAACo/NbAjNnHuBAI/s1600/indexer1.png https://lh5.googleusercontent.com/-L-du00g5mAs/U0bq-GtSFrI/AAAAAAAAACw/LJEbtGXnQ34/s1600/indexer2.png

And CPU usage during the same timeframe:

https://lh5.googleusercontent.com/-EHhbNorLiMY/U0bsqBGSE6I/AAAAAAAAADM/f21iu_46e_M/s1600/indexer2_cpu.png https://lh3.googleusercontent.com/-e4TAQXmAdK4/U0bsZuA2Z3I/AAAAAAAAAC8/u3G7BzNkGWE/s1600/indexer1_cpu.png

Attached are my two indexer configs.

Really frustrated by this issue, I've tried everything I can think of but
the end result is always the same. Any help or thoughts would be greatly
appreciated.

-Ryan

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/61d6aadd-ac06-4f2f-ae68-477d53ebd2e5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Logstash at 100% CPU, slow to process Redis queue to Elasticsearch Logstash	3	1068	July 6, 2017
Indexing performance terrible after upgrading from 1.6 to 2.4 Elasticsearch	2	472	July 5, 2017
Performance Issues with ElasticSearch Elasticsearch	10	2655	July 5, 2017
Serious data delay in elasticsearch Elasticsearch	2	327	July 5, 2017
Logstash reading from Redis Slows Significantly After Several Days Logstash	1	501	November 27, 2019

Issues with logstash sustained throughput

Related topics