I'm trying to find out if something is possible or not within Painless. I'd like to use the English analyzer within a Painless script. I'd then like to use the same painless script to iterate over the created terms array. Looking at the painless documentation, I wasn't able to find any reference to using a text analyzer within painless. So I was wondering if this is possible at all?
Context:
Elasticsearch version: 7.14.x
The painless script will be part of of a script processor within an ingest pipeline, and I want to loop through the analyzed text to tag specific messages depending if they contain specific words
the ingest processors are running before the documents gets indexed - which implies it is running before the document is analyzed using the english analyzer.
I suppose you need stemming working. Have you taken a look at the percolate query, that you could use before the document is indexed to do this exact tagging that you are talking about? See Percolate query | Elasticsearch Guide [7.14] | Elastic
If not, feel free to explain your use-case a little further!
@spinscale thanks for the information. I understand that since the document hasn't been ingested, it wouldn't have been processed by the analyzer of the mapped field, which is why I want to analyze it via painless before being ingested into the mapped field.
You are correct that I'm looking for essentially the stemmer part of the English analyzer. Reading over the percolate docs, I don't fully understand the concept, so I'm not 100% sure if it will work for my use-case.
Here is a more detailed example of my use-case.
I have an app which allows for users to input text (paragraph, free form style text). I then want to ingest this data into Elasticsearch to fulfill two purposes (both via the Kibana UI).
Provide a way to search the data as text. This part is simple, just map the text field to a English analyzed keyword field.
Provide a way to visualize/aggregate the text by categories. This is where I'm running into issues. Since the text is free-formed a user can theoretically put anything in. I want to use the English analyzer to at least standardize the text. I then want to loop through the terms that the analyzer produced, and if the text contains 1 or more words of a specific category, add a tag to the text for that category.
Example:
If the text contains the words pay, money, bonus, or overtime. Tag the message with business.
If the text contains the words computer, tablet, or phone. Tag the message with IT.
If the text contains pay and computer. Tag the message with both business and IT.
Then create a bar chart in Kibana which shows the counts of each category of messages.
Your second use case is exactly where percolate can shine. I'd suggest to play around with it a little and take your time to read the docs and built a reproducible example, that you can share here. Once that works and you can, you need to integrate the percolate step in your ingestion pipeline, before sending the document to Elasticsearch.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.