Return matching analyzed field when querying

Hi everybody,

Context:
My real product is a bit complex, so I'm explaining it as blog post, then.
I have blog posts which are stored in ES.
I have an analyzer to analyze posts' content into tokens. The analyzer will be changed and might become more complex as the product grows up.

Everytime a user search for posts, say he's searching "multi thread programming java", I would like to know which analyzed tokens (token, start_offset, end_offset) of field "post.content.analyzed_my_way" match user's search string, So that I can make these words bold (like Google search).
This field ideally should be return along with each hits when request ES query.

Problems:

  1. "post.content.analyzed_my_way" is not stored by default, so it's not returned through "store_fields". Is it a good practice to store the analyzed field?

  2. Logic of the analyzer is complex and will be changed often, so I cannot mimic it outside of ES. That violates DRY, too.

  3. Calling POST /post/_analyze to analyze a content on fly doesn't seems a good choice, because it repeats the task of querying (ES already analyzed/indexed post documents).
    GET /post/_doc/post-id/_termvectors seems a bit better but still not solve my #4 problem.

  4. While querying, ES doesn't return position of matching token in it explanations

  5. If "post.content" becomes "post.contents" (array), #3 solution doesn't point out which content a analyzed token belong to.

Any helps will be helpful and appreciated.
Thank you for reading.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.