Beginner questions on Supervised ML

Hi all,

Sorry if this question has been asked somewhere previously but I couldn't find any relevant information.
For supervised ML, during training phase, do we have to provide label for each event that is already in Elasticsearch? Because if the event itself already has the label, why do we still need Machine Learning to predict it?

Thank you.

The idea would be that you have some labeled data, the training data, stored in an Elasticsearch index. You can use the data frame API to access the functionality to train a model from this as you observe. Note that not all the data provided to a classification job needs to be labelled and any unlabelled data will have a prediction added using the model. Furthermore, once the model has been trained we provide tools to run inference using that model on unlabelled data in the stack. For example, you can run it in an ingest processor or as part of a pipeline aggregation. There is also nothing to stop you training models elsewhere and importing them, provided we support inference for them. This github repo provides python converters for supported types and we are continuing to work on supporting additional model types for inference. Hope this helps!

2 Likes

Thanks a lot for the explanation.
Is it possible to have this feature whereby user can labelled their data manually on Kibana? For example the data in Elasticsearch only contains some features and user can label each event as dog or cat on Kibana?
Cause right now i was wondering if my data in Elasticsearch does not have the label, how can i add this extra field to it...

I am afraid, you cannot add feature values manually for individual docs in Kibana. You may be able to use the update by query API or runtime fields to add additional fields to your docs if you can define a rule/query that discriminates cats and dogs in your training data.

1 Like

Understand, thanks a lot for your explanation :wink: