Using Elasticsearch as a Data Lake and forwarding logs to SIEM based on the use cases implemented


Hello,

I really need your help and vision on a situation I'm passing through.
Here is the description:

We have a SIEM (QRadar) structure, with Event Collectors that receive logs from various data sources from many customers. Those logs are then correlated by the configured Use Case/SIEM rules.However, we have a lot of logs from various data sources that are being sent to the SIEM that we don't need for Use Cases/SIEM Rules. They are just being stored for searching, reporting etc....
Our goal here is to reduce the number of events received on the SIEM (the rate of Event Per Second)
With that, we want to keep receiving every logs (the ones that we need for Use Cases and the ones that we don't need) but before they reach the SIEM we want to filter them in order to have just the ones we need for use cases being sent to the SIEM and the others to stay in Elastic.
We were thinking of two ways:

  1. Having the following structure - DATA SOURCES --> AGREGATOR OF LOGS --> DATA LAKE ELASTIC(data lake receives all the logs and forwards the ones needed for Use Cases to SIEM) --> SIEM
  2. Having the following structure - DATA SOURCES --> AGREGATOR OF LOGS (aggregator of logs receives all the logs and forwards the ones we need for Use Cases to the SIEM and the ones we dont need to Elastic) --> SIEM

Is this approach I'm describing, possible to implement?
From this two aproaches which one of them would you think it would be the best?

Other questions I have are:

  1. Does Elastic permit segregation of customers? and segregation of datasources?
  2. We have logs that come through syslog with LEEF format here is an example: "LEEF:2.0|Check Point|VPN-1 & FireWall-1|1.0|Accept|...", LEEF format doesn't bring the source host IP or hostname like the normal syslog header on the payload, so the log will arrive at the Data Leak with the original Source IP of the Data Source, but will leave the Data Lake with the Source IP being the one of the Data Lake. SIEM looks for the host on the payload first, if it doesnt find it uses the source ip from where the log comes. Doest elastic permit the forwarding of logs withthe Source IP still being the one of the original data source, but doing this without modifying the original log, because of the legal purposes?
  3. Does Elastic guarantees integrity and authenticity of the logs for legal purposes?
  4. Does Elastic permits to apply retention policies for the logs by each customer?
  5. Is Elastic capable of segregating data sources like LEEF?

Could you help me with this situation?

Thank you so much.

Welcome to our community! :smiley:

  1. Yes, you can put different data into different indices
  2. Elasticsearch will store the original event as received, but it stores it in _source and the message is then broken into json as well
  3. No, you should use hashes of the events or something similar to ensure this
  4. Yes, look at ILM
  5. Yes, see point 1

Also, why not use our SIEM?

1 Like

Hello Mark,

Thank you very much for your reply.

  1. No, you should use hashes of the events or something similar to ensure this --> Does Elastic has any functionality for that?

Regards.

Logstash has the fingerprint filter that might work for you.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.