Recommended way to index html documents


(David Croé) #1

hi,
i'm new to elasticsearch and i have some simple question after reading the beginners guide.

my standard usecase is:

fulltext search in html documents.

after searching, a click on a result should open the document and jump to the occurance of the hit.

as far as i understood these steps have to be done:

add document to index
the analyzer uses a character filter to remove html tags
the tokenizer splits the text into words

but than the information is lost where exactly the search hit resides in the document.

please anyone give me a hint which documentation to read or recommend a way how to solve my
standard usecase.

best regards
david


(system) #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.