This is part logstash, part elastic search, but since the decision is being made at the Logstash level, I thought I would ask here:
What are peoples strategies for indexing incoming BRO data? It creates over a dozen logfiles which need to be parsed and indexed into ES.
First, I was storing everything in a single ES index, but reflecting on RDBMS's, I thought that might be inefficient, because some logs have one set of fields, and other logs have completely different sets of fields. So in an effort to "normalize", I've begun storing each log type in its own separate index.
However, sometimes we need to be alerted about events that happen ACROSS log types.
So I've started working on additional logstash steps to normalize a subset of the BRO data (regardess of log type), to store in a new index (bro-combined), so we have a single index to refer to, and can go to the dedicated indexes for more information if needed.
I'm wondering if you all think I'm on the right track, or if i'm going about things in completely the wrong way. It seems to make sense from a perspective on workflow, but I am worried that this strategy will double the insert transactions that ES needs to handle, while the bandwidth difference will be somewhat less extreme (probably 1.4-1.5x).
Interested to hear peoples thoughts.