Hi everybody,
Context:
My real product is a bit complex, so I'm explaining it as blog post, then.
I have blog posts which are stored in ES.
I have an analyzer to analyze posts' content into tokens. The analyzer will be changed and might become more complex as the product grows up.
Everytime a user search for posts, say he's searching "multi thread programming java", I would like to know which analyzed tokens (token, start_offset, end_offset) of field "post.content.analyzed_my_way" match user's search string, So that I can make these words bold (like Google search).
This field ideally should be return along with each hits when request ES query.
Problems:
-
"post.content.analyzed_my_way" is not stored by default, so it's not returned through "store_fields". Is it a good practice to store the analyzed field?
-
Logic of the analyzer is complex and will be changed often, so I cannot mimic it outside of ES. That violates DRY, too.
-
Calling POST /post/_analyze to analyze a content on fly doesn't seems a good choice, because it repeats the task of querying (ES already analyzed/indexed post documents).
GET /post/_doc/post-id/_termvectors seems a bit better but still not solve my #4 problem. -
While querying, ES doesn't return position of matching token in it explanations
-
If "post.content" becomes "post.contents" (array), #3 solution doesn't point out which content a analyzed token belong to.
Any helps will be helpful and appreciated.
Thank you for reading.