Tuning for high volume logging?

Hi all,

We have been integrating an ElasticSearch log writer into Bro network
monitor (http://www.bro-ids.org) and we have a few users that are
monitoring extremely high volume networks and want to insert their logs
into ElasticSearch but their logging rate will hover around 40k-50k
documents per second for relatively long periods of time. We are already
doing index rotation which has been nice for expiring old data and with
searching constrained time periods but I suspect there is more we
could/should be doing.

Are there any tuning guides available for techniques we could be using to
insert documents at high rates?

Thanks!
.Seth

--

The logstash wiki has some good points: GitHub - elastic/logstash: Logstash - transport and process your logs, events, or other data (_all for example can reduce the overhead of indexing).

Also, what I found is that many times 2-3 machines in the cluster can be a really beefy machines, and others for old logs can be less beefy. In that case, you can use index shard allocation and dynamic relocation by making sure the "current" index is on the beefy machine, and the old indices are moved to the "less" beefy machines.

On Aug 29, 2012, at 6:19 PM, Seth Hall seth.hall@gmail.com wrote:

Hi all,

We have been integrating an Elasticsearch log writer into Bro network monitor (http://www.bro-ids.org) and we have a few users that are monitoring extremely high volume networks and want to insert their logs into Elasticsearch but their logging rate will hover around 40k-50k documents per second for relatively long periods of time. We are already doing index rotation which has been nice for expiring old data and with searching constrained time periods but I suspect there is more we could/should be doing.

Are there any tuning guides available for techniques we could be using to insert documents at high rates?

Thanks!
.Seth

--

--

But how do you force dynamic relocation ?

2012/8/29 Shay Banon kimchy@gmail.com

The logstash wiki has some good points:
GitHub - elastic/logstash: Logstash - transport and process your logs, events, or other data(_all for example can reduce the overhead of indexing).

Also, what I found is that many times 2-3 machines in the cluster can be a
really beefy machines, and others for old logs can be less beefy. In that
case, you can use index shard allocation and dynamic relocation by making
sure the "current" index is on the beefy machine, and the old indices are
moved to the "less" beefy machines.

On Aug 29, 2012, at 6:19 PM, Seth Hall seth.hall@gmail.com wrote:

Hi all,

We have been integrating an Elasticsearch log writer into Bro network
monitor (http://www.bro-ids.org) and we have a few users that are
monitoring extremely high volume networks and want to insert their logs
into Elasticsearch but their logging rate will hover around 40k-50k
documents per second for relatively long periods of time. We are already
doing index rotation which has been nice for expiring old data and with
searching constrained time periods but I suspect there is more we
could/should be doing.

Are there any tuning guides available for techniques we could be using
to insert documents at high rates?

Thanks!
.Seth

--

--

--

I would also be interested in this information about dynamic shard
relocation. I have also been working on tuning for massive amounts of
inserts lately, with 40-50k/s sustained required. I have finally gotten
things tuned well with the primary change being implementing a spread each
index as wide as possible around the cluster approach to allocation. I
have written an alternate shard allocator based on this approach and will
be submitting a pull request in the next day or 2 after I finish writing my
test cases.

The primary problem I had up until this point was if I had to restart a
node or 2 it ended up bunching some indexes up on a small number of nodes
causing performance issues.

On Thursday, August 30, 2012 4:10:14 AM UTC-4, Filirom1 wrote:

But how do you force dynamic relocation ?

2012/8/29 Shay Banon <kim...@gmail.com <javascript:>>

The logstash wiki has some good points:
GitHub - elastic/logstash: Logstash - transport and process your logs, events, or other data(_all for example can reduce the overhead of indexing).

Also, what I found is that many times 2-3 machines in the cluster can be
a really beefy machines, and others for old logs can be less beefy. In that
case, you can use index shard allocation and dynamic relocation by making
sure the "current" index is on the beefy machine, and the old indices are
moved to the "less" beefy machines.

On Aug 29, 2012, at 6:19 PM, Seth Hall <seth...@gmail.com <javascript:>>
wrote:

Hi all,

We have been integrating an Elasticsearch log writer into Bro network
monitor (http://www.bro-ids.org) and we have a few users that are
monitoring extremely high volume networks and want to insert their logs
into Elasticsearch but their logging rate will hover around 40k-50k
documents per second for relatively long periods of time. We are already
doing index rotation which has been nice for expiring old data and with
searching constrained time periods but I suspect there is more we
could/should be doing.

Are there any tuning guides available for techniques we could be using
to insert documents at high rates?

Thanks!
.Seth

--

--

--