Ahhh okay that is quite a different question then.
If you are just running a classification task with the ML component - you do not have enough features to have a very accurate model, so it makes sense that your scores would be quite low for this case. I just tried that in the UI too and my accuracy is 50% - so not much better than random.
What the blog describes is that when you try to build a classifier for text there are a lot of steps you'd need to take to process the data you have into enough features/insights that the ML model could detect trends and make predictions.
To quote the blog:
Most NLP tasks start with a standard preprocessing pipeline:
- Gathering the data
- Extracting raw text
- Sentence splitting
- Tokenization
- Normalizing (stemming, lemmatization)
- Stopword removal
- Part of Speech tagging
Now the cool thing that the blog offered as another solution was using the "more_like_this" query because of the native implementation of a lot of those steps within the analyzers / logic of the query. Hence why the classification then works "out-of-the-box" on just an unprocessed text field.
If you want to instead use the ML job, (and get a higher accuracy) you need to make sure you create those features yourself: like the blog mentioned, either with NLP libraries or other kinds of elasticsearch transformers and pipelines.
On the Machine Learning sections of the docs they also mention data processing.. This is not done automatically within the job like in the case of the more_like_this query.
So to summarize:
- you can follow the blog example, and then your mapping & other details you provided are fine, and you'd get a pretty accurate model.
- or you can find some other ways to do the pre-processing of your data before you use the ML Classifier in Kibana