Word count/frequency per field

M.alsioufi · December 7, 2018, 10:37am

Hi there,
is there any convenient way to get the count of words/tokens in some fields of a document?

for example:
curl -XPUT 'http://localhost:9200/twitter/tweet/1?pretty=true' -d '{
"text1" : "twitter, test, test, test ",
"text2" : "test, test, man, two "
}'

then count words that I need from that document are:
text1:{
"twitter":1,
"test":3
},
text2: {
"test":2,
"man":1,
"two":1
}
or something similar

I know I can use termvector but I could not really understand how this can help me.

Thank you

SaskiaVola · December 10, 2018, 2:10pm

Hey,

the _termvector API is the best way to access information about term statistics in Elasticsearch after your data has been indexed.

If you want to get the length of a field in tokens, you can use the type token_count in your mapping.

What problem are you trying to solve?

M.alsioufi · December 13, 2018, 4:37pm

Thanks for your answer.

I have tried termvector API it does almost what I expect however, I have to run it on one specific document I could not run it on my entire index. so following the same example I showed.
My request looks like:
GET my_index/doc/someid123/_termvectors
{
"fields": ["text1"]

}
and then the reply I get:
{
"_index": "my_index",
"_type": "doc",
"_id": "someid123",
"_version": 2,
"found": true,
"took": 1,
"term_vectors": {
"text1": {
"field_statistics": {
"sum_doc_freq": 56,
"doc_count": 54,
"sum_ttf": 60
},
"terms": {
"test": {
"term_freq": 3,
"tokens": [
{
"position": 0,
"start_offset": 9,
"end_offset": 12
},
{
"position": 1,
"start_offset": 15,
"end_offset": 19
},
{
"position": 2,
"start_offset": 21,
"end_offset": 25
}
]
},
"twitter":{
"term_freq": 1,
"tokens": [
{
"position": 0,
"start_offset": 0,
"end_offset": 6
}
]
}
}
}
}
}

What I want to do is to be able to get this functionality among all my index not a special document, and to be able to show this data in a Kibana visualization

system · January 10, 2019, 4:37pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Count words/tokens in a field in a document Elasticsearch	3	4013	January 11, 2017
Elasticsearch token_vector analysis over an entire field Elasticsearch	4	724	October 18, 2017
Elasticsearch: total term frequency and doc count from given set of documents Elasticsearch	5	10080	February 9, 2018
Find word count in text field Elasticsearch	1	342	November 29, 2021
Count the occurrence of words in ElasticSearch Elasticsearch elastic-stack-monitoring , elastic-stack-alerting , docker	5	3315	January 11, 2022

Word count/frequency per field

Related topics