My understanding is that even with latest version of ElasticSearch (5.1.2) there is no build-in functionality for crawling a website.
What are the options and any recommendations among preferred options.
My understanding is that even with latest version of ElasticSearch (5.1.2) there is no build-in functionality for crawling a website.
What are the options and any recommendations among preferred options.
Is it a public website with static pages or a private one built from database data?
Thanks David for your reponse. Its a public website with section of the pages with dynamic data
So I guess you don't have access to the datasource? The structured data I mean?
May be using nutch could help? I know there are some recipes on the web about connecting Nutch and elasticsearch.
Thanks Again David. I was thinking Nutch implementation itself will be heavy weight with dependency for underlying store. Are there any lightweight options ?
I don't know. Never did any web crawling in the past.
I'm always prefering indexing from the data source than from the rendered pages. But may be it's not possible for you.
No problem. Thank you David.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.
© 2020. All Rights Reserved - Elasticsearch
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant logo are trademarks of the Apache Software Foundation in the United States and/or other countries.