I just started to play with elasticsearch and must say the usability is
quite nice. However I really need a feature that seems not so trivial to
access. Given my mapping
The details of this mapping are not important only that I set
"term_vector": "with_positions_offsets" on the body field. I would like to
access the explicit offsets of search terms in a string field to calculate
a score. Please also note questions
Please tell me if I am missing something obvious. If it is something
elasticsearch just hack in on a regular basis, could someone point me in
the right direction?
I made some progress here by writing a plugin. I found [1] and [2] very
useful here. Would like to share once I cleaned this up a bit. Now my only
problem left is that the offsets from newly indexed documents do not show
up until I restart elasticsearch, (i.e. reopen all readers). Since I have
only little experience with elasticsearch: Could someone please describe
whether it is possible to get real-time or near-real-time termvector/offset
information in the plugin scope in a similarly automatic way as it work for
search/get?
Am Freitag, 10. Mai 2013 16:12:27 UTC+2 schrieb Max Hoffmann:
Dear All,
I just started to play with elasticsearch and must say the usability is
quite nice. However I really need a feature that seems not so trivial to
access. Given my mapping
The details of this mapping are not important only that I set
"term_vector": "with_positions_offsets" on the body field. I would like to
access the explicit offsets of search terms in a string field to calculate
a score. Please also note questions ElasticSearch get offsets of highlighted snippets - Stack Overflow
and ElasticSearch get offsets of highlighted snippets - Stack Overflow .
I think marking the raw text with a special token as pre_tag/post_tag to
determine the position on the client sounds rather backwards if all I need
is the explicit offset. Ideally one could access it in a script_field
evaluation.
Please tell me if I am missing something obvious. If it is something
elasticsearch just hack in on a regular basis, could someone point me in
the right direction?
I made some progress here by writing a plugin. I found [1] and [2]
very useful here. Would like to share once I cleaned this up a bit.
Now my only problem left is that the offsets from newly indexed
documents do not show up until I restart elasticsearch, (i.e. reopen
all readers). Since I have only little experience with elasticsearch:
Could someone please describe whether it is possible to get real-time
or near-real-time termvector/offset information in the plugin scope in
a similarly automatic way as it work for search/get?
So, as far as I understood the documentation, reader.getLiveDocs() can only
help to filter out deleted documents from a reader but not add newly
indexed documents to an existing reader. Also it seems that elasticsearch
0.90 uses lucene 4.2 where the interface for termDocsEnum has changed[1].
Regardless, I am mostly wondering right now, whether it is possible to
reopen a reader (if index changed) from the environment where the
Transport*Action.java is run.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.