Word count/frequency per field

(M. Alsioufi) #1

Hi there,
is there any convenient way to get the count of words/tokens in some fields of a document?

for example:
curl -XPUT 'http://localhost:9200/twitter/tweet/1?pretty=true' -d '{
"text1" : "twitter, test, test, test ",
"text2" : "test, test, man, two "

then count words that I need from that document are:
text2: {
or something similar

I know I can use termvector but I could not really understand how this can help me.

Thank you



the _termvector API is the best way to access information about term statistics in Elasticsearch after your data has been indexed.

If you want to get the length of a field in tokens, you can use the type token_count in your mapping.

What problem are you trying to solve?

(M. Alsioufi) #3

Thanks for your answer.

I have tried termvector API it does almost what I expect however, I have to run it on one specific document I could not run it on my entire index. so following the same example I showed.
My request looks like:
GET my_index/doc/someid123/_termvectors
"fields": ["text1"]

and then the reply I get:
"_index": "my_index",
"_type": "doc",
"_id": "someid123",
"_version": 2,
"found": true,
"took": 1,
"term_vectors": {
"text1": {
"field_statistics": {
"sum_doc_freq": 56,
"doc_count": 54,
"sum_ttf": 60
"terms": {
"test": {
"term_freq": 3,
"tokens": [
"position": 0,
"start_offset": 9,
"end_offset": 12
"position": 1,
"start_offset": 15,
"end_offset": 19
"position": 2,
"start_offset": 21,
"end_offset": 25
"term_freq": 1,
"tokens": [
"position": 0,
"start_offset": 0,
"end_offset": 6

What I want to do is to be able to get this functionality among all my index not a special document, and to be able to show this data in a Kibana visualization

(system) closed #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.