Determining the count of words in a given field

Honestabe · May 16, 2024, 12:02am

I wrote a painless script to determine the word count in a particular filed

emit (doc['skus.keyword'].length)

the problem is that it doesn't count duplicates. so if there is more than one of the same keyword it only count it one. so if the field contains 3 keyword total but 2 are the say it say i have 2. is there a way to get the total count?

demjened · May 17, 2024, 9:53pm

Hi Honestabe,

Assuming skus.keyword is mapped as a keyword field and not text, you can split the text with the splitOnToken() method and count the length of the result array:

emit(doc['skus.keyword'].value.splitOnToken(' ').length)

Honestabe · May 17, 2024, 10:48pm

I tried it but it one shows 1 now

demjened · May 21, 2024, 2:16pm

Could you share your index mapping and a sample document in which you expect the word count to be > 1?

Honestabe · May 21, 2024, 8:15pm

This is the indexing mapping for the field

DOC TEMP

and a sample doc with expected word count of skus. I appreciate the help!

demjened · May 21, 2024, 8:31pm

Ah, so there is an array of SKUs in the skus.keyword field, and you want to count the number of items in the array, right? That's a different problem from counting the words in a single text string.

Let me try to find an answer. Are the two SKUs in your example identical?

Honestabe · May 21, 2024, 8:35pm

yes they are identical
I tried to use the text field but it said
"Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead."

Honestabe · May 21, 2024, 8:37pm

would there have been a way to use it's text filed to do it?

demjened · May 21, 2024, 9:15pm

would there have been a way to use it's text filed to do it?

No, keyword is the right choice. text is for text analysis.

The reason your identical SKUs get deduplicated is due to the inverted index structure that Elasticsearch builds on top of the documents for optimal searching. Instead you can look into the stored document's _source and do the array length calculation there:

emit(params['_source']['skus'].length)

Depending on the context in which you're running the script you may need to modify it. The above should work if you're using it e.g. in a runtime field context.

Honestabe · May 21, 2024, 9:24pm

That worked. Thanks! Is there a place you guys have some of this painless syntax documented? I know some of the very basic stuff is there but I was having trouble finding the more complex ones

demjened · May 21, 2024, 9:44pm

I'm glad it worked!

The syntax is a bit scattered in our documentation, unfortunately - some parts are in the language specification and in the context sections, and it requires some trial and error to find the right solution.

Topic		Replies	Views
Counting Terms In Text Field Kibana	3	31	August 6, 2024
Finding message length Elasticsearch painless	4	955	March 14, 2022
How to find number of characters in a text field Elasticsearch painless	3	3158	August 17, 2022
Number Of Occurrences Of A Particular Word in a field Kibana	6	5159	June 8, 2018
Aggregating multiple values from single fields Kibana	5	244	June 21, 2022

Determining the count of words in a given field

Related topics