Custom Term Vectors

Hello,

I'm trying to create custom term frequencies for terms in my documents, independent of the actual term frequencies in the document and including expansion terms that are not in the original document. Position and offset data are not needed. I was wondering what would be a good way to get this in Elasticsearch?

For example, if I have the document "quick brown fox", instead of the frequencies

{"quick":1, "brown":1, "fox":1}

I want to have frequencies such as

{"quick":1, "brown":2, "fox":3, "mammal":2, "red":1}

One way I could think of would be just adding a second field that uses a Whitespace analyzer and contains a text with the exact frequencies that I want:

{
    "_source": {
        "text": "quick brown fox",
        "custom_tf": "quick brown brown fox fox fox mammal mammal red"
    }
}

But this seems a bit messy. Any ideas for a better solution?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.