I am evaluating which crawler to use with ES. Do you guys have any experience or suggestions? Just did some research, and I found choices include Nutch and River Web. Personally, I don't want to involve another software such as HBase.
I ended up using nutch. And yes , nutch only works with 2.3 at this point. Since 2.3 has all functions I need, I'm fine with it. I recently found scrapy very powerful. May be worth a try and writing some indexing code on your own though.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.