Term vectors filter by search word


(bjo) #1

I have implemented the examples in Term Vectors
and Multi termvectors API but it seems like all these examples are designed to get document/documents based on id/ids. Is there some way of filtering the term results based on a search text (word or part of a word)?

Btw when doing a prefix search

GET /my_index/my-index/_search
{
    "query": {
        "prefix": {
            "main_text": "word_part"
        }
    }
}

I get the results I want, but without the frequency count I need.


(Mayya Sharipova) #2

Is there some way of filtering the term results based on a search text (word or part of a word)?

I am not completely clear what you want here. Do you want to filter what terms are returned for a certain document and field based on some query? Or do you want get document frequencies for certain terms without knowing what documents contain them?

Anyway, term vectors API is exactly as described in the links you provided, and doesn't have any other options.


(bjo) #3

Glad you asked, maybe my question was not very clear. I am trying to get all documents where a term or part of a term occurs and get the frequency of that term for each document. For instance search for "car" and get all documents where "car" occurs and the frequency for each document.

This is almost accomplished by the prefix query I provided, the only problem is the frequency is missing and it's too heavy to count the frequency for each document after I get the results.

I already have a working solution using a database where I store all the terms with frequencies, for each document, so I can check if a term exists in a document and retrieve the frequency fast.
My solution does not work with searches for parts of words (prefix) and since Elasticsearch can retrieve all documents including a term I thought there must be a way to get the frequency as well somehow.


(Mayya Sharipova) #4

I understand your problem, thanks for the further explanation. Unfortunately, we don't expose term frequency through queries. You can either leverage a highlighter as it highlights all terms and you know how many terms are highlighted in a document, or you have to develop your own plugin that can help with it.