Count words/tokens in a field in a document

Jimmy_Oentung · December 13, 2016, 4:01am

Hi there,
is there any convenient way to get the count of words/tokens in each field of a document?

I know I can use termvector and sum all term frequency in each field to get this number. but I was wondering if there is a faster way to do it.

for example:
curl -XPUT 'http://localhost:9200/twitter/tweet/1?pretty=true' -d '{
"fullname" : "John Doe",
"text" : "twitter test test test "
}'

then count words that I need from that document are:
fullname: 2
text: 4

One more thing, I need the total number of words that actually stored (i.e. after filtering the stopwords).

Thank you

cbuescher · December 13, 2016, 10:42am

Hi,

have you looked at the token_count datatype? It looks like it might be doing what you are trying to do. In order to retrieve the values calculated for the count field you might need to set it to "store" : true. If you follow the example in the reference, in order to retrieve the values you can use

GET my_index/_search?stored_fields=name.length

Also it seems to support analyzers.

Jimmy_Oentung · December 14, 2016, 4:09am

Hi Christoph,
thank you for your respond. Yeah, I tried the token_count and it works. but it counts the original words count not the one after filtering out the stopwords.

Any idea how to get the count of words after removing stopwords?

Thank you

system · January 11, 2017, 4:09am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Word count/frequency per field Elasticsearch	3	3369	January 10, 2019
Count total number of words from all documents from a specific field accordingly Elasticsearch	1	528	November 7, 2019
Elasticsearch Retrieve token_count standard value from search Elasticsearch	3	467	January 14, 2020
Count Each word in a text Field Elasticsearch	8	1644	April 12, 2022
Elasticsearch token_vector analysis over an entire field Elasticsearch	4	728	October 18, 2017

Count words/tokens in a field in a document

Related topics