I have the requirement of ingesting metricbeat/filebeat data twice, once with all the default fields, and a second time with only the fields the dashboards will be using.
This two versions will have different retention policies too. The goal is to save space keeping detailed logs for less time than "dashboard logs", and also to make the dashboards faster.
Migrating to Fleet is not an option atm.
The options I can think of:
A) Adding Logstash in the middle and duplicating events
B) Running metricbeat/filebeat twice with different configuration files (for Filebeat I will have to use different config and data paths to avoid blocking the registry).
This is a big infra so A will be difficult, and I'm not sure if B would affect the hosts performance in some way.
I am not aware of other options. Keep in mind when you run two Filebeat/Metricbeat instances that you have to configure a different data folder for each. Otherwise, IDs and file states can interfere with each other.
Also, you could open an enhancement request on GH to have a similar processor to the existing clone plugin of Logstash.
That issue is about Elastic Agent, not Beats. You can have multiple outputs in Elastic Agent, but it basically starts multiple Beat instances, so you are better off with running multiple instances of Beats without Agent. (Unless you need the features provided by Agent.)
Also considered that.
But is there a way to do this within Elastic? Otherwise this mean to configure an external tool and we are kind of in the same place as Logstash. The short version of the index needs to be available at the same time as the extended version (live)
@kvch I was expecting to find the data folder as a flag under the run section but didnt find it:
I wonder if on the query performance side, ingesting only parts of the documents will really make a difference? Have you tested this? If not, we are left with the storage savings. What is the driver here, cost?
I thought about running a cron script that reindex with less fields, but introducing custom tools is not something I can do.
I understand rollups are to reduce the number of fields after some time, and the whole idea is to have both versions of the data at the same time, and then keep a shorter version afterwards.
About reducing fields for performance I agree with you, I'm not sure that would increase performance.
The requirement started when after finishing the Kibana Dashboards the loading time was 15-20s, with <200ms ES queries, and Kibana nodes not stressed at all. Elastic support discovered that was a bug related to the number of mapping fields and upgrading would fix.
ES Was updated, the loading time decreased but still high. Then we asked ourselves:
what if we remove the unused fields from the mappings anyways
what if we also try removing the unused fields, that way we query dashboards against lighter data and save space.
Every query is against millions of documents, so maybe this impacts the kibana performance.