Classification of long text

iulia · March 14, 2024, 5:01pm

Ahhh okay that is quite a different question then.

If you are just running a classification task with the ML component - you do not have enough features to have a very accurate model, so it makes sense that your scores would be quite low for this case. I just tried that in the UI too and my accuracy is 50% - so not much better than random.

What the blog describes is that when you try to build a classifier for text there are a lot of steps you'd need to take to process the data you have into enough features/insights that the ML model could detect trends and make predictions.

To quote the blog:

Most NLP tasks start with a standard preprocessing pipeline:

Gathering the data

Extracting raw text

Sentence splitting

Tokenization

Normalizing (stemming, lemmatization)

Stopword removal

Part of Speech tagging

Now the cool thing that the blog offered as another solution was using the "more_like_this" query because of the native implementation of a lot of those steps within the analyzers / logic of the query. Hence why the classification then works "out-of-the-box" on just an unprocessed text field.

If you want to instead use the ML job, (and get a higher accuracy) you need to make sure you create those features yourself: like the blog mentioned, either with NLP libraries or other kinds of elasticsearch transformers and pipelines.

On the Machine Learning sections of the docs they also mention data processing.. This is not done automatically within the job like in the case of the more_like_this query.

So to summarize:

you can follow the blog example, and then your mapping & other details you provided are fine, and you'd get a pretty accurate model.
or you can find some other ways to do the pre-processing of your data before you use the ML Classifier in Kibana

Topic		Replies	Views
Text Categorization in ES Elasticsearch	10	2202	July 6, 2017
Auto Classification Elasticsearch	8	1491	July 5, 2017
Classification of String Data Elasticsearch elastic-stack-machine-learning	3	639	February 2, 2021
Categorizing images with deep learning into Elasticsearch Elasticsearch	10	4616	July 5, 2017
Machine Learning on ES fields Elasticsearch	1	489	June 15, 2017

Classification of long text

Related topics