Create single index using multiple html files


(Amit) #1

Hi,
I am very new user for elasticsearch.I want to create single index by using multiple html files how could i do this.please let me know proper way to do it in elasticsearch.
I have upload an image which type of searching i want to implement using html files and its content.


(David Pilato) #2

You need to crawl the webpages and for each one create a json document which contains page URL and content.

Then look at the analysis documentation in the guide. HTML strip might help you. https://www.elastic.co/guide/en/elasticsearch/reference/5.4/analysis-htmlstrip-charfilter.html


(Amit) #3

Thanks for answering.I tried it, but i am not able to achieve my goal using this.

My requirement is:

First I need to create an single index in elasticsearch for all html files.After that I want to get all html files and its content on web page as i will do searching.

ex. if i will search for ''content1'' if it is in file_1 and file_2 in index_1 so both file should show on web page and its content.like this:
search bar: content1
file_1 as link
its content
file_2 as link
its content

1.In attachment what I have shown you, is an example.it is showing multiple links that should be nothing but my html files and below down should be my content.
2.As I will click on html file link so it should show content of clicked html file.
3.how i will show only 2 or 3 lines of content.

Note: all this data which i want to show come from elasticsearch.


(David Pilato) #4

Not sure I'm following. But in short, create one document per html page like this:

{
  "content": "BASE64 content here",
  "url": "http://link.to.your.page/"
}

Then when you will get results, you will see something like:

{
  "attachment": {
      "content_type": "text/html",
      "content": "Lorem ipsum dolor sit amet",
      "content_length": 28
  },
  "url": "http://link.to.your.page/"
}

Just use that result to display some content of your document. Even better, use highlighting feature to only extract lines of text matching your query.
And use the url field to display a link to your document.


(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.