Let me elaborate what i want and why i want that then may be it would be easier to find a best solution.
I am new to ML jobs, I've created Single metric jobs and also have explored Multi-metric ML jobs a little bit. I know we can add influencers to the Multi-Metric jobs.
Now i need to create a real scenario job. Where there can be multiple detectors and multiple influencers.
We have microservices, all logs and metric beat data is coming from them into 1 Kibana index.
I have extracted fields of interest
This is an outlier detection ML job.
So we are getting data points from different sources. And data fields are specific to a specific source but they may have an impact on each other.
Following is a sample data set, all the fields are available in one index but some documents would have some of the data fields and others would have some others.
index = 2021IDX
proecessing_time, slow_query, cpu_usage_service1, cpu_usage_service2, data_field2_log1, data_field1_log2, data_field1_metricbeat1, data_field1_metricbeat2, database_fragmentation_count_metricbeat2
My question is:
Do i really need to combine those events somehow so all the data fields of interest are available on each event/doc?
If not does it leave any impact on the ML job behavior?
event 1 looks like this:
2021/1/1T12:0:0.162, data_field1_metricbeat2 , database_fragmentation_count_metricbeat2, cpu_usage_service1
event 2 looks like this:
2021/1/1T12:0:0.180, proecessing_time , data_field1_metricbeat1, data_field2_log1, slow_query
event 3 looks like this:
2021/1/1T12:0:1.178, cpu_usage_service2 , data_field1_log2
Now If i create an ML Job with these fields some as metric/detector some as influencer.
How the ELK would combine all these fields to consider them as one row?
I know jobs does aggregation of events over the given time bucket but to keep things simple i am just talking about 1 instance of each type of event.
To address this i was thinking to combine them all into one.
Is it really required in my scenario? Or i don't have to worry about it and ML job would take care?
I know for sure that Data frame analytics is not working for this scenario, When i create the data frame analytics it only shows some fields to include in the job.
And to include others i need to filter the data based by the 'log-type' only then i can see those fields but in that case the fields from other logs are not available.
Considering this I still feel that we need to have them available in all indexed documents, what do you say?
I was thinking to combine them, but not sure if that's correct approach?
I would really appreciate if i can get any guidance on this. I don't find these scenarios explained anywhere in the documentation or videos.