Term vectors filter by search word

I have implemented the examples in Term Vectors
and Multi termvectors API but it seems like all these examples are designed to get document/documents based on id/ids. Is there some way of filtering the term results based on a search text (word or part of a word)?

Btw when doing a prefix search

GET /my_index/my-index/_search
{
    "query": {
        "prefix": {
            "main_text": "word_part"
        }
    }
}

I get the results I want, but without the frequency count I need.

Is there some way of filtering the term results based on a search text (word or part of a word)?

I am not completely clear what you want here. Do you want to filter what terms are returned for a certain document and field based on some query? Or do you want get document frequencies for certain terms without knowing what documents contain them?

Anyway, term vectors API is exactly as described in the links you provided, and doesn't have any other options.

Glad you asked, maybe my question was not very clear. I am trying to get all documents where a term or part of a term occurs and get the frequency of that term for each document. For instance search for "car" and get all documents where "car" occurs and the frequency for each document.

This is almost accomplished by the prefix query I provided, the only problem is the frequency is missing and it's too heavy to count the frequency for each document after I get the results.

I already have a working solution using a database where I store all the terms with frequencies, for each document, so I can check if a term exists in a document and retrieve the frequency fast.
My solution does not work with searches for parts of words (prefix) and since Elasticsearch can retrieve all documents including a term I thought there must be a way to get the frequency as well somehow.

I understand your problem, thanks for the further explanation. Unfortunately, we don't expose term frequency through queries. You can either leverage a highlighter as it highlights all terms and you know how many terms are highlighted in a document, or you have to develop your own plugin that can help with it.

@mayya how to achieve counting using highlighter, I have a similar requirement.

In a given document I want to get counts for list of phrases like ["hi", "howdy", "how are you", "hello sir", "hello"], I am able to aggregate and highlight them but want to give the counts as well (aggregation gives doc count and not occurence count)

The explain API can give you term frequencies for a given doc.

1 Like

@maheshgawali Sorry, by counting frequencies using a highlighter, I meant you do it at the level of your application, just counting how many terms were highlighted.

and as @Mark_Harwood explain API also exposes frequencies.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.