Recommended way to index html documents

dcroe · December 14, 2016, 12:39pm

hi,
i'm new to elasticsearch and i have some simple question after reading the beginners guide.

my standard usecase is:

fulltext search in html documents.

after searching, a click on a result should open the document and jump to the occurance of the hit.

as far as i understood these steps have to be done:

add document to index
the analyzer uses a character filter to remove html tags
the tokenizer splits the text into words

but than the information is lost where exactly the search hit resides in the document.

please anyone give me a hint which documentation to read or recommend a way how to solve my
standard usecase.

best regards
david

system · January 11, 2017, 12:39pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Indexing HTML Elasticsearch	5	674	July 6, 2017
Indexing HTML documents, problems with JSON Elasticsearch	5	981	July 6, 2017
Search and highlight html Elasticsearch	1	390	January 12, 2017
Sample code to find similar HTML Documents Elasticsearch	2	155	November 14, 2023
Pattern for Indexing HTML Documents Elasticsearch	3	2970	July 26, 2017