Indexing weighted terms (or fake term frequency)

Hannes_Korte · September 9, 2020, 3:21pm

Hi,

I am looking for a way to fake the frequency of terms in a field at index time. It's kind of a boost for a specific term in a specific document.

Imagine your docs are people with each having a map of weighted skills, e.g.:

{
    "id": "A1",
    "name": "John",
    "skills": {
        "programming/kotlin": 0.8,
        "programming/java": 0.5,
        "sports/handball": 0.2,
        "sports/climbing": 0.1,
    }
}

The list of skills is quite long, so it is impossible to store each one in a separate field. My naive solution is to discretize the skill terms, like:

{
    "id": "A1",
    "name": "John",
    "skillTerms": [
        "programming/kotlin", "programming/kotlin", ... (8x)
        "programming/java", "programming/java", ... (5x)
        "sports/handball", "sports/handball",
        "sports/climbing"
    }
}

This way I can simply do a term query for "programming/kotlin" to get this person ranked higher than another one with a lower weight for this skill (at least approximately). The downside is the unnecessarily large size of the documents due to the obvious redundancy.

Can you think of any other less hacky solution? Maybe by using something like the copy_to mapping option with a count parameter? Or is there any plugin to fake the term frequency?

Thanks

system · October 7, 2020, 3:21pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.