Hi,
Hopefully this is the right place to post; if not, please move/direct me accordingly,
I've enabled endpoint security on some of our systems, running under a fleet controlled elastic agent.
Initially the VM I was testing with was getting hammered by IO occasionally after setting up endpoint, I partially attributed this to it being still partially a testing setup (thus running two docker Elasticsearch instances on a single node) and the VM disk being rotational.
Switched the VM over to SSD and haven't had an issue since.
However; of course, the base issue is the massive ingest of data related to endpoint security and of course SSD storage is more costly. (I am aware that in my current semitest configuration that I'm semi-needlessly duplicating data; the second instance and it's data will be moved eventually to a different vm)
The insight and metrics that it provides is great and I'm hoping to be able to make good use of it, hopefully including some anomaly detection and alerts.
Our business is in an industry (financial) which places a high value on security and being able to audit previous actions is important as well.
We need to be able to balance insight and auditability with reasonable storage requirements. I'm not 100% sure of all the tools available for doing this, specifically in regards to endpoint security. (While system metrics and docker metrics (also running w/elastic agent managed by fleet) also are a bit heavy, endpoint seems to be much more)
Speaking of system metrics/docker metrics, I'm also not entirely sure if some data may be duplicated here from endpoint; specifically, say, process data I think is pushed by metrics and also endpoint, I'm not sure if this is the case and if so if it can be de-duped reasonably without affecting the dashboards for example.
In my search it seemed like rollups could be a potential solution. Filtering also came up in my search but then we're just losing data. Is there another way to perhaps lower the sample rate of the data at times or equivalently filter older data to reduce it's fidelity/sample count? I'm wondering if perhaps some metrics (whether pushed by endpoint or docker/system metrics) have a higher sample rate than is needed.
Rollups seem to do what I want, at least possibly in the respect of the ability to limit the size/fidelity of older/less significant data.
I went to try and create a rollup in Kibana for endpoint security and wasn't completely sure how to proceed.
First, other than 'metrics-endpoint.metadata_current_default' which is only 14docs/148kb, the only indexes that seem to relate to endpoint are 'hidden'.
If I show hidden indexes, I see '.ds-logs-endpoint.events-file-default-2022.05-27-000001', '.ds-logs-endpoint.events.process-default-2022.05.27-000001' and '.ds-logs-endpoint.events.network-default-2022.05.27-000001' as the biggest indexes with the highest docs counts (also, '.ds-metrics-system.process-default-2022.05.27-000001' but this is more related to system metrics I beleive?)
Initially I went to put in endpoint as an index pattern but then I wasn't sure; I'm not entirely sure how this will affect my ability to view data in Kibana under security. I'm also not 100% sure exactly how/if I'm able to specify it to only roll up older data.
In the rollup docs, it says 'we’d like to rollup these documents into hourly summaries, which will allow us to generate reports and dashboards with any time interval one hour or greater.' which sounds pretty close to what I'd like to do, at least on older data but I'm not entirely clear if this will affect existing dashboards or if new ones must be created/adjusted to read the rollup index or if that's the same for the views under Security in Kibana?
Thanks for your help;
- one running out of space admin!