How to create outlier jobs with data fields coming from multiple sources (log1,log2, metricbeat1, etc....)

Hi everyone,

Let me elaborate what i want and why i want that then may be it would be easier to find a best solution.

I am new to ML jobs, I've created Single metric jobs and also have explored Multi-metric ML jobs a little bit. I know we can add influencers to the Multi-Metric jobs.

Now i need to create a real scenario job. Where there can be multiple detectors and multiple influencers.

We have microservices, all logs and metric beat data is coming from them into 1 Kibana index.

I have extracted fields of interest

This is an outlier detection ML job.

So we are getting data points from different sources. And data fields are specific to a specific source but they may have an impact on each other.

Following is a sample data set, all the fields are available in one index but some documents would have some of the data fields and others would have some others.

index = 2021IDX

proecessing_time, slow_query, cpu_usage_service1, cpu_usage_service2, data_field2_log1, data_field1_log2, data_field1_metricbeat1, data_field1_metricbeat2, database_fragmentation_count_metricbeat2

My question is:

Do i really need to combine those events somehow so all the data fields of interest are available on each event/doc?

If not does it leave any impact on the ML job behavior?

e.g.

event 1 looks like this:
2021/1/1T12:0:0.162, data_field1_metricbeat2 , database_fragmentation_count_metricbeat2, cpu_usage_service1

event 2 looks like this:
2021/1/1T12:0:0.180, proecessing_time , data_field1_metricbeat1, data_field2_log1, slow_query

event 3 looks like this:
2021/1/1T12:0:1.178, cpu_usage_service2 , data_field1_log2

Now If i create an ML Job with these fields some as metric/detector some as influencer.

How the ELK would combine all these fields to consider them as one row?

I know jobs does aggregation of events over the given time bucket but to keep things simple i am just talking about 1 instance of each type of event.

To address this i was thinking to combine them all into one.

Is it really required in my scenario? Or i don't have to worry about it and ML job would take care?

I know for sure that Data frame analytics is not working for this scenario, When i create the data frame analytics it only shows some fields to include in the job.
And to include others i need to filter the data based by the 'log-type' only then i can see those fields but in that case the fields from other logs are not available.

Considering this I still feel that we need to have them available in all indexed documents, what do you say?

I was thinking to combine them, but not sure if that's correct approach?

I would really appreciate if i can get any guidance on this. I don't find these scenarios explained anywhere in the documentation or videos.

Thanks

This is being discussed in a separate thread here: How to aggregate multiple events coming from different logs with slight different timestamp, when the only field is timestamp to combine those? - #4 by richcollier

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.