Hi All,
I want to use ElasticSearch 6.4 as I am going to replace GSA in my application. ElasticSearch will crawl around 60+ websites including child links.
Since we are not going to use Cloud based solution -
Can I use nutch 2.3.1 crawler and My SQL with ElasticSearch 6.4
Which other softwares do I need to replace GSA with ElasticSearch?
Depending on your needs and your current GSA configuration, there really aren't OSS web crawlers out there that cover everything GSA does and what websites produce these days. Handling things like modern javascript frameworks and complex authentication are incredibly difficult to do and many commercial crawlers still struggle with those technologies. That being said, commercial offerings are probably your best bet if you really are in a hurry to solve the problem.
If you did want to build your own, Nutch is a descent place to start, but be prepared to spend a long making it do everything the GSA does today.
As for other software, you could conceivably cobble everything together including document parsers, linguistic packages etc, but again it can take a while to make it all work together.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.