Retrieving model parameters of Data Frame Analytics job

cristianplaytreks · April 11, 2020, 2:03pm

Hi,

New to ELK here, love it so far!

I created a regression Data Frame Analytics job successfully. Is it possible to retrieve the parameters of the XGBoost model (i.e. the same parameters I would need in order to create an inference model using PUT _ml/inference/<model_id>? I have looked into getting inference model GET _ml/inference/ but the response contains information about the job rather than the XGBoost model.

What about the feature importance of the independent variables (for example, like in this article), would that be possible to retrieve?

Finally, a broader question, any recommendations where to learn more how to properly use the ML libraries in ELK other than the anomaly detection (which I see there's a lot of emphasis on)?

Cheers,
Cristian

dmitri · April 13, 2020, 12:12pm

Hi Cristian,

In the upcoming 7.7 release we are enhancing the _stats API for data frame analytics jobs with much more information about the model, including hyperparameter values.

With regard to feature importance, you can already calculate feature importance since version 7.6. For regression/classification, you will need to explicitly enable the calculation setting the num_top_feature_importance_values parameter. You can read more in https://www.elastic.co/guide/en/elasticsearch/reference/current/put-dfanalytics.html.

In addition, you might find this blog about feature importance useful: https://www.elastic.co/blog/feature-importance-for-data-frame-analytics-with-elastic-machine-learning.

Finally, with regard to learning more about how to use data frame analytics and model inference, at the moment material is mostly limited to the existing documentation. There is also a webinar that introduces the feature: https://www.elastic.co/webinars/introduction-to-supervised-machine-learning-in-elastic. We realize there is much more ground to cover, but note that it is an experimental feature that the ML team is actively developing at the moment. As we get closer to a Beta release, we'll also be working on adding material that can guide users into using these features more effectively.

Bear with us! In the meantime, please keep asking questions in the forum. We'll be happy to answer!

cristianplaytreks · April 17, 2020, 10:56am

Thanks Dimitris for your answer, a lot of useful info there! Excited to see the new features in 7.7 I find it a really cool feature overall to have general ML jobs directly in the Elasticsearch ecosystem.

Will version 7.7 provide info about preprocessing as well? For example whether the data gets scaled, whitened, encoded etc.

dmitri · April 27, 2020, 3:24pm

Hi Cristian,

Will version 7.7 provide info about preprocessing as well? For example whether the data gets scaled, whitened, encoded etc.

7.7 will not contain such info as part of the _stats API but it is indeed a great suggestion. We're looking into exposing such information in a nice format in the future. At the moment, there is information about feature encoding in the model itself (see get trained model API) but not in the nicest format.

system · May 25, 2020, 3:24pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Machine Learning API Elasticsearch elastic-stack-machine-learning	3	780	October 4, 2017
Data frame analytics on scripted fields ML possible? Elasticsearch elastic-stack-machine-learning	18	913	November 22, 2021
Migrate the machine learning job from one environment to other Elasticsearch elastic-stack-machine-learning	2	1242	October 29, 2018
Simple prediction Elasticsearch elastic-stack-machine-learning	3	522	May 17, 2018
[Machine Learning Export] - Ability to share Machine Learning data Kibana elastic-stack-machine-learning	6	1127	March 14, 2019

Retrieving model parameters of Data Frame Analytics job

Related topics