My real product is a bit complex, so I'm explaining it as blog post, then.
I have blog posts which are stored in ES.
I have an analyzer to analyze posts' content into tokens. The analyzer will be changed and might become more complex as the product grows up.

Everytime a user search for posts, say he's searching "multi thread programming java", I would like to know which analyzed tokens (token, start_offset, end_offset) of field "post.content.analyzed_my_way" match user's search string, So that I can make these words bold (like Google search).
This field ideally should be return along with each hits when request ES query.


  1. "post.content.analyzed_my_way" is not stored by default, so it's not returned through "store_fields". Is it a good practice to store the analyzed field?

  2. Logic of the analyzer is complex and will be changed often, so I cannot mimic it outside of ES. That violates DRY, too.

  3. Calling POST /post/_analyze to analyze a content on fly doesn't seems a good choice, because it repeats the task of querying (ES already analyzed/indexed post documents).
    GET /post/_doc/post-id/_termvectors seems a bit better but still not solve my #4 problem.

  4. While querying, ES doesn't return position of matching token in it explanations

  5. If "post.content" becomes "post.contents" (array), #3 solution doesn't point out which content a analyzed token belong to.

Any helps will be helpful and appreciated.
