I'm trying to highlight searched keyword in matched document with some custom works. Therefore, I need to know position (or offset) of that keyword in document. However, I found no documentation showing clearly how to do that. I know that when set "index_options" to "offsets" or "term_vector" to "with_positions_offsets", position for token will be generated and is stored together with token, but I don't know how to fetch those values.
Please give me some suggestions. Any help would be appreciated!
Thank you Mark for your support. It's useful but seems we need to do multi-steps to get those information, like:
Step 1: Get term vector of document
Step 2: Filter search keyword from result of step 1
I would like to know if we can get offset position in result (for example in highlight, because elastic engine has ability to return highlighted text with pre & post tags, I assume that it knows text positions), so we don't need extra step.
You can supply custom markup tags e.g.instead of <em> you could have <somethingMyAppUnderstands> but this won't return the offsets.
I expect the most likely solution would be to implement a custom highlighter plugin (see example) because these are given the resources you need to get hold of the query tokens and the document contents. With some custom code you could return the required output.
Thank Mark for mentioning the plugin. I'm going to try it. Btw, I found this issue https://github.com/elastic/elasticsearch/issues/5736, still open for more than 3 years, hope next releases will implement this helpful feature.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.