Determining the count of words in a given field

I wrote a painless script to determine the word count in a particular filed

emit (doc['skus.keyword'].length)

the problem is that it doesn't count duplicates. so if there is more than one of the same keyword it only count it one. so if the field contains 3 keyword total but 2 are the say it say i have 2. is there a way to get the total count?

Hi Honestabe,

Assuming skus.keyword is mapped as a keyword field and not text, you can split the text with the splitOnToken() method and count the length of the result array:

emit(doc['skus.keyword'].value.splitOnToken(' ').length)

I tried it but it one shows 1 now

Could you share your index mapping and a sample document in which you expect the word count to be > 1?

image

This is the indexing mapping for the field

DOC TEMP

and a sample doc with expected word count of skus. I appreciate the help!

Ah, so there is an array of SKUs in the skus.keyword field, and you want to count the number of items in the array, right? That's a different problem from counting the words in a single text string.

Let me try to find an answer. Are the two SKUs in your example identical?

yes they are identical
I tried to use the text field but it said
"Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead."

would there have been a way to use it's text filed to do it?

would there have been a way to use it's text filed to do it?

No, keyword is the right choice. text is for text analysis.

The reason your identical SKUs get deduplicated is due to the inverted index structure that Elasticsearch builds on top of the documents for optimal searching. Instead you can look into the stored document's _source and do the array length calculation there:

emit(params['_source']['skus'].length)

Depending on the context in which you're running the script you may need to modify it. The above should work if you're using it e.g. in a runtime field context.

That worked. Thanks! Is there a place you guys have some of this painless syntax documented? I know some of the very basic stuff is there but I was having trouble finding the more complex ones

I'm glad it worked!

The syntax is a bit scattered in our documentation, unfortunately - some parts are in the language specification and in the context sections, and it requires some trial and error to find the right solution.

1 Like