Parsing / Indexing strategies for BRO?

lucasjkr · August 31, 2017, 2:17pm

This is part logstash, part elastic search, but since the decision is being made at the Logstash level, I thought I would ask here:

What are peoples strategies for indexing incoming BRO data? It creates over a dozen logfiles which need to be parsed and indexed into ES.

First, I was storing everything in a single ES index, but reflecting on RDBMS's, I thought that might be inefficient, because some logs have one set of fields, and other logs have completely different sets of fields. So in an effort to "normalize", I've begun storing each log type in its own separate index.

However, sometimes we need to be alerted about events that happen ACROSS log types.

So I've started working on additional logstash steps to normalize a subset of the BRO data (regardess of log type), to store in a new index (bro-combined), so we have a single index to refer to, and can go to the dedicated indexes for more information if needed.

I'm wondering if you all think I'm on the right track, or if i'm going about things in completely the wrong way. It seems to make sense from a perspective on workflow, but I am worried that this strategy will double the insert transactions that ES needs to handle, while the bandwidth difference will be somewhat less extreme (probably 1.4-1.5x).

Interested to hear peoples thoughts.

thank!

warkolm · September 1, 2017, 10:23am

How are you doing the alerting?

lucasjkr · September 1, 2017, 1:01pm

I'm evaluating SIEMonster, so I'm using the FourOneOne interface supplied as part of that package.

https://demo.fouroneone.io/

Blaxican707 · September 2, 2017, 5:14pm

You can run multiple IF and ELSE IF statements if you equate them to the different logs into one config file, so there really is no need for different config files.

lucasjkr · September 2, 2017, 5:42pm

I like the method Kustodian chose of using separate .conf files for each log type, it's more modular, easier to keep in a VCS, can compare different filters side by side, etc. Either way, it works the same.

More the question was about WHAT to do with the data after log stash, store in a single ES index, a different index for each logtype (protocol), or a combination of the two (individual indexes, with a second "insert" of a subset of each record into a combined index.

I know in the regular database world, it's more efficient to normalize data, I'm not sure how that relates to Lucene/ES yet. Like, is it better to keep everything together even if each set of fields is only used by 10% of the records, or break them out to store and index separately?

warkolm · September 2, 2017, 8:56pm

Depends.
Sparse data can be stored inefficiently but there's improvements coming in 6.0 in Lucene for that. However, you can read data from different indices by simply querying them at the same time. So I guess it depends on how you interact with Elasticsearch via these other tools.

system · September 30, 2017, 8:56pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Best practices for indexing log data Logstash	6	20480	October 25, 2017
How to change log files indexing between Logstash and Elasticsearch Logstash	4	727	May 19, 2017
Should I run multiple indexes? Elasticsearch	7	1835	July 5, 2017
Best practise for index creation Elasticsearch	15	4975	December 25, 2017
Advice needed on configuration approach Logstash	4	777	July 6, 2017

Parsing / Indexing strategies for BRO?

Related topics