I am trying to get total term frequency and document count from given set of documents, but _termvectors in elasticsearch returns ttf and doc_count from all documents within the index. Is there any way so that I can specify list of documents (document ids) so that result will based on those documents only.
Below are documents details and query to get total term frequency:
Index details:
PUT /twitter
{ "mappings": {
"tweets": {
"properties": {
"name": {
"type": "text",
"analyzer":"english"
}
}
}
},
"settings" : {
"index" : {
"number_of_shards" : 1,
"number_of_replicas" : 0
}
}
}
Document Details:
PUT /twitter/tweets/1
{
"name":"Hello bar"
}
PUT /twitter/tweets/2
{
"name":"Hello foo"
}
PUT /twitter/tweets/3
{
"name":"Hello foo bar"
}
It will create three document with ids 1, 2 and 3. Now suppose tweets with ids 1 and 2 belongs to user1 and 3 belong to another user and I want to get the termvectors for user1.
Query to get this result:
GET /twitter/tweets/_mtermvectors
{
"ids" : ["1", "2"],
"parameters": {
"fields": ["name"],
"term_statistics": true,
"offsets":false,
"payloads":false,
"positions":false
}
}
Response:
{
"docs": [
{
"_index": "twitter",
"_type": "tweets",
"_id": "1",
"_version": 1,
"found": true,
"took": 1,
"term_vectors": {
"name": {
"field_statistics": {
"sum_doc_freq": 7,
"doc_count": 3,
"sum_ttf": 7
},
"terms": {
"bar": {
"doc_freq": 2,
"ttf": 2,
"term_freq": 1
},
"hello": {
"doc_freq": 3,
"ttf": 3,
"term_freq": 1
}
}
}
}
},
{
"_index": "twitter",
"_type": "tweets",
"_id": "2",
"_version": 1,
"found": true,
"took": 1,
"term_vectors": {
"name": {
"field_statistics": {
"sum_doc_freq": 7,
"doc_count": 3,
"sum_ttf": 7
},
"terms": {
"foo": {
"doc_freq": 2,
"ttf": 2,
"term_freq": 1
},
"hello": {
"doc_freq": 3,
"ttf": 3,
"term_freq": 1
}
}
}
}
}
]
}
Here we can see hello
is having doc_count 3 and ttf 3. How can I make it to consider only documents with given ids.
One approach I am thinking is to create different index for different users. But I am not sure if this approach is correct. With this approach indices will increase with users. Or can there be another solution?