Hello I'm writing this question in the hopes for some clarity regarding token_vector analysis. I am looking to get a count of unique tokens in a text/keyword field. So for example in the token_vector query below:
GET /twitter/tweet/1/_termvectors
{
"fields" : ["text"],
"offsets" : true,
"payloads" : true,
"positions" : true,
"term_statistics" : true,
"field_statistics" : true
}
The response I receive is a count of the tokens within the document. Instead what I would like is a count of the different tokens over an entire field in the index -- not a single document. Instead of this response:
{
"_id": "1",
"_index": "twitter",
"_type": "tweet",
"_version": 1,
"found": true,
"took": 6,
"term_vectors": {
"text": {
"field_statistics": {
"doc_count": 2,
"sum_doc_freq": 6,
"sum_ttf": 8
},
"terms": {
"test": {
"doc_freq": 2,
"term_freq": 3,
"tokens": [
{
"end_offset": 12,
"payload": "d29yZA==",
"position": 1,
"start_offset": 8
},
{
"end_offset": 17,
"payload": "d29yZA==",
"position": 2,
"start_offset": 13
},
{
"end_offset": 22,
"payload": "d29yZA==",
"position": 3,
"start_offset": 18
}
],
"ttf": 4
},
"twitter": {
"doc_freq": 2,
"term_freq": 1,
"tokens": [
{
"end_offset": 7,
"payload": "d29yZA==",
"position": 0,
"start_offset": 0
}
],
"ttf": 2
}
}
}
}
}
I would like just term counts over an entire index. In trying to figure this out I have loaded a field as both text and keyword (it's a list of addresses). What I'm looking for is a count of all unique terms within this Address field. My hope is the response from ES would be something like this:
{
"index" : addresses,
"type" : by_your_house"
"terms" : {
"ROAD" {}
"STREET" {}
"LANE" {}
}}
I have tried using kibana for this task but it will not split up the address terms correctly. It will instead show aggregations of entire Street names that are common. So instead of above I, what I see in Kibana is:
{
"1007 Mountain Drive" : 99
"20 Ingram Street" : 55
" 1938 Sullivan Lane" : 11
}
Thanks for any/all help with this.
- Matt