How to set up website crawl with ElasticSearch

Arul_Krishnamoorthy · January 23, 2017, 4:08pm

My understanding is that even with latest version of ElasticSearch (5.1.2) there is no build-in functionality for crawling a website.

What are the options and any recommendations among preferred options.

dadoonet · January 23, 2017, 4:23pm

Is it a public website with static pages or a private one built from database data?

Arul_Krishnamoorthy · January 23, 2017, 4:34pm

Thanks David for your reponse. Its a public website with section of the pages with dynamic data

dadoonet · January 23, 2017, 5:07pm

So I guess you don't have access to the datasource? The structured data I mean?

May be using nutch could help? I know there are some recipes on the web about connecting Nutch and elasticsearch.

Arul_Krishnamoorthy · January 24, 2017, 9:19am

Thanks Again David. I was thinking Nutch implementation itself will be heavy weight with dependency for underlying store. Are there any lightweight options ?

dadoonet · January 24, 2017, 9:32am

I don't know. Never did any web crawling in the past.
I'm always prefering indexing from the data source than from the rendered pages. But may be it's not possible for you.

Arul_Krishnamoorthy · January 24, 2017, 9:43am

No problem. Thank you David.

system · February 21, 2017, 9:43am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Crawling with ElasticSearch Elasticsearch	1	249	July 6, 2017
Website crawl and index into Elasticsearch Elastic Community and Ecosystem	4	2212	October 24, 2017
Which web crawler works best with ES Elasticsearch	3	2342	July 5, 2017
Indexing spider data into elastic 5.5 Elasticsearch	1	608	August 22, 2017
Indexing to Elasticsearch elasticsearch 5.6.3 from Apache Nutch Elasticsearch	1	1082	February 24, 2018

How to set up website crawl with ElasticSearch

Related topics