Website crawl and index into Elasticsearch

(Venture Misquitta) #1


I am quite new to ElasticSearch, have prior experience in Solr.
My use case is that we have a intranet website with documents attached as links in the web pages.
I have been instructed to create a search architecture that would be able to search through the webpages as well as attached documents.
The problem is I have not been able to find a way to index this website data into Elasticsearch.
Please can someone guide me in the right direction.

Venture M.

(Mark Walkom) #2

A lot of people use Apache Nutch for this.

(Venture Misquitta) #3

Thanks for your response.
Is there a tutorial that I can use for POC ?

Best Regards,

(Mark Walkom) #4

Probably, your favourite search engine can find it for you.

(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.