Website crawl and index into Elasticsearch


(Venture Misquitta) #1

Hi,

I am quite new to ElasticSearch, have prior experience in Solr.
My use case is that we have a intranet website with documents attached as links in the web pages.
I have been instructed to create a search architecture that would be able to search through the webpages as well as attached documents.
The problem is I have not been able to find a way to index this website data into Elasticsearch.
Please can someone guide me in the right direction.

Thanks,
Venture M.


(Mark Walkom) #2

A lot of people use Apache Nutch for this.


(Venture Misquitta) #3

Thanks for your response.
Is there a tutorial that I can use for POC ?

Best Regards,
Venture


(Mark Walkom) #4

Probably, your favourite search engine can find it for you.


(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.