Start and end offset of a token in elasticsearch


(reza) #1

Hi friends.

I want to obtain offsets of a token in it's own document in Elasticsearch.

TermVector method can give this information, but i don't want to have the termvector of a document, because it maybe be very large for transferring through a restful service.

Thanks.


Offset of only one term in a document, not document vector
(Adrien Grand) #2

I don't think there are other ways. For the record, getting offsets tends to be error-prone as these offsets depend on the encoding which is being used, so if you are not using ASCII you might have surprises. The offsets that elasticsearch returns are computed on the UTF16-encoded string (which is Java's internal string encoding).


(system) #3