I wrote a painless script to determine the word count in a particular filed
emit (doc['skus.keyword'].length)
the problem is that it doesn't count duplicates. so if there is more than one of the same keyword it only count it one. so if the field contains 3 keyword total but 2 are the say it say i have 2. is there a way to get the total count?
Assuming skus.keyword is mapped as a keyword field and not text, you can split the text with the splitOnToken() method and count the length of the result array:
Ah, so there is an array of SKUs in the skus.keyword field, and you want to count the number of items in the array, right? That's a different problem from counting the words in a single text string.
Let me try to find an answer. Are the two SKUs in your example identical?
yes they are identical
I tried to use the text field but it said
"Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead."
would there have been a way to use it's text filed to do it?
No, keyword is the right choice. text is for text analysis.
The reason your identical SKUs get deduplicated is due to the inverted index structure that Elasticsearch builds on top of the documents for optimal searching. Instead you can look into the stored document's _source and do the array length calculation there:
emit(params['_source']['skus'].length)
Depending on the context in which you're running the script you may need to modify it. The above should work if you're using it e.g. in a runtime field context.
That worked. Thanks! Is there a place you guys have some of this painless syntax documented? I know some of the very basic stuff is there but I was having trouble finding the more complex ones
The syntax is a bit scattered in our documentation, unfortunately - some parts are in the language specification and in the context sections, and it requires some trial and error to find the right solution.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.