I'm been wrestling with this problem for days.
For example, if I have
doc_id1:"thor marvel",
doc_id2:"spiderman thor",
doc_id3:"the avengers captain america ironman thor"
three documents in elastic search, and do a search query for "thor", I want it to tell me where keyword "thor" is found in each document, like { doc_id1: 1, doc_id2: 2, doc_id3: 6} as the desired result.
I have two possible solution on top of my head now:
-
figure out a way to put the _vectorterm info (which includes all the positions/offsets for each token/word of the document) into _source, so that I can directly access _vectorterm info in my normal search result. I can then construct the (doc, position) list outside elasticsearch. Normally, you can only access _vectorterm info for a single document at a time given the index/type/id, which is why it's tricky. That should be the ideal way to achieve the goal
-
figure out a way to trigger an action whenever a new document is added. This action will scan through all the tokens/words in the new document, create a new "token" index(if it doesn't exist) and append (doc_id, position) pair to it like
{ keyword:"thor" [ doc_id1:1,doc_id2:2,doc_id3:6] }.
So that I just need to search for "thor" among keywords indexes and then get the (doc, position) lists. This seems to be even harder and less optimal.
Sadly, I don't know how to do either one. I'll appreciate it if someone can give me some help on this. Many thanks!