Tuning for high volume logging?

Seth_Hall · August 29, 2012, 3:19pm

Hi all,

We have been integrating an ElasticSearch log writer into Bro network
monitor (http://www.bro-ids.org) and we have a few users that are
monitoring extremely high volume networks and want to insert their logs
into ElasticSearch but their logging rate will hover around 40k-50k
documents per second for relatively long periods of time. We are already
doing index rotation which has been nice for expiring old data and with
searching constrained time periods but I suspect there is more we
could/should be doing.

Are there any tuning guides available for techniques we could be using to
insert documents at high rates?

Thanks!
.Seth

--

kimchy · August 29, 2012, 9:43pm

The logstash wiki has some good points: GitHub - elastic/logstash: Logstash - transport and process your logs, events, or other data (_all for example can reduce the overhead of indexing).

Also, what I found is that many times 2-3 machines in the cluster can be a really beefy machines, and others for old logs can be less beefy. In that case, you can use index shard allocation and dynamic relocation by making sure the "current" index is on the beefy machine, and the old indices are moved to the "less" beefy machines.

On Aug 29, 2012, at 6:19 PM, Seth Hall seth.hall@gmail.com wrote:

Hi all,

We have been integrating an Elasticsearch log writer into Bro network monitor (http://www.bro-ids.org) and we have a few users that are monitoring extremely high volume networks and want to insert their logs into Elasticsearch but their logging rate will hover around 40k-50k documents per second for relatively long periods of time. We are already doing index rotation which has been nice for expiring old data and with searching constrained time periods but I suspect there is more we could/should be doing.

Are there any tuning guides available for techniques we could be using to insert documents at high rates?

Thanks!
.Seth

--

--

Filirom1 · August 30, 2012, 8:09am

But how do you force dynamic relocation ?

2012/8/29 Shay Banon kimchy@gmail.com

The logstash wiki has some good points:
GitHub - elastic/logstash: Logstash - transport and process your logs, events, or other data(_all for example can reduce the overhead of indexing).

Also, what I found is that many times 2-3 machines in the cluster can be a
really beefy machines, and others for old logs can be less beefy. In that
case, you can use index shard allocation and dynamic relocation by making
sure the "current" index is on the beefy machine, and the old indices are
moved to the "less" beefy machines.

On Aug 29, 2012, at 6:19 PM, Seth Hall seth.hall@gmail.com wrote:

Hi all,

We have been integrating an Elasticsearch log writer into Bro network
monitor (http://www.bro-ids.org) and we have a few users that are
monitoring extremely high volume networks and want to insert their logs
into Elasticsearch but their logging rate will hover around 40k-50k
documents per second for relatively long periods of time. We are already
doing index rotation which has been nice for expiring old data and with
searching constrained time periods but I suspect there is more we
could/should be doing.

Are there any tuning guides available for techniques we could be using
to insert documents at high rates?

Thanks!
.Seth

--

--

--

Brad_Voth · September 10, 2012, 3:37am

I would also be interested in this information about dynamic shard
relocation. I have also been working on tuning for massive amounts of
inserts lately, with 40-50k/s sustained required. I have finally gotten
things tuned well with the primary change being implementing a spread each
index as wide as possible around the cluster approach to allocation. I
have written an alternate shard allocator based on this approach and will
be submitting a pull request in the next day or 2 after I finish writing my
test cases.

The primary problem I had up until this point was if I had to restart a
node or 2 it ended up bunching some indexes up on a small number of nodes
causing performance issues.

On Thursday, August 30, 2012 4:10:14 AM UTC-4, Filirom1 wrote:

But how do you force dynamic relocation ?

2012/8/29 Shay Banon <kim...@gmail.com <javascript:>>

The logstash wiki has some good points:
GitHub - elastic/logstash: Logstash - transport and process your logs, events, or other data(_all for example can reduce the overhead of indexing).

Also, what I found is that many times 2-3 machines in the cluster can be
a really beefy machines, and others for old logs can be less beefy. In that
case, you can use index shard allocation and dynamic relocation by making
sure the "current" index is on the beefy machine, and the old indices are
moved to the "less" beefy machines.

On Aug 29, 2012, at 6:19 PM, Seth Hall <seth...@gmail.com <javascript:>>
wrote:

Hi all,

We have been integrating an Elasticsearch log writer into Bro network
monitor (http://www.bro-ids.org) and we have a few users that are
monitoring extremely high volume networks and want to insert their logs
into Elasticsearch but their logging rate will hover around 40k-50k
documents per second for relatively long periods of time. We are already
doing index rotation which has been nice for expiring old data and with
searching constrained time periods but I suspect there is more we
could/should be doing.

Are there any tuning guides available for techniques we could be using
to insert documents at high rates?

Thanks!
.Seth

--

--

--

Topic		Replies	Views
Cluster resource usage Elasticsearch	14	447	July 6, 2017
Write throughput test on elasticsearch 9 high configuration nodes cluster, Elasticsearch	13	2029	July 6, 2017
What can I do to make the "readings" do not disturb "writings"? Elasticsearch	7	419	July 6, 2017
ES indexing throughput and scalability Elasticsearch	7	1062	July 6, 2017
Performance Limitation with ELK stack Elasticsearch	7	2894	July 6, 2017

Tuning for high volume logging?

Related topics