Retrieving model parameters of Data Frame Analytics job

Hi,

New to ELK here, love it so far!

I created a regression Data Frame Analytics job successfully. Is it possible to retrieve the parameters of the XGBoost model (i.e. the same parameters I would need in order to create an inference model using PUT _ml/inference/<model_id>? I have looked into getting inference model GET _ml/inference/ but the response contains information about the job rather than the XGBoost model.

What about the feature importance of the independent variables (for example, like in this article), would that be possible to retrieve?

Finally, a broader question, any recommendations where to learn more how to properly use the ML libraries in ELK other than the anomaly detection (which I see there's a lot of emphasis on)?

Cheers,
Cristian

Hi Cristian,

In the upcoming 7.7 release we are enhancing the _stats API for data frame analytics jobs with much more information about the model, including hyperparameter values.

With regard to feature importance, you can already calculate feature importance since version 7.6. For regression/classification, you will need to explicitly enable the calculation setting the num_top_feature_importance_values parameter. You can read more in https://www.elastic.co/guide/en/elasticsearch/reference/current/put-dfanalytics.html.

In addition, you might find this blog about feature importance useful: https://www.elastic.co/blog/feature-importance-for-data-frame-analytics-with-elastic-machine-learning.

Finally, with regard to learning more about how to use data frame analytics and model inference, at the moment material is mostly limited to the existing documentation. There is also a webinar that introduces the feature: https://www.elastic.co/webinars/introduction-to-supervised-machine-learning-in-elastic. We realize there is much more ground to cover, but note that it is an experimental feature that the ML team is actively developing at the moment. As we get closer to a Beta release, we'll also be working on adding material that can guide users into using these features more effectively.

Bear with us! In the meantime, please keep asking questions in the forum. We'll be happy to answer!

Thanks Dimitris for your answer, a lot of useful info there! Excited to see the new features in 7.7 :sunglasses: I find it a really cool feature overall to have general ML jobs directly in the Elasticsearch ecosystem.

Will version 7.7 provide info about preprocessing as well? For example whether the data gets scaled, whitened, encoded etc.

Hi Cristian,

Will version 7.7 provide info about preprocessing as well? For example whether the data gets scaled, whitened, encoded etc.

7.7 will not contain such info as part of the _stats API but it is indeed a great suggestion. We're looking into exposing such information in a nice format in the future. At the moment, there is information about feature encoding in the model itself (see get trained model API) but not in the nicest format.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.