Web Crawler


(dhturner14) #1

Does anyone actually use Nutch with ElasticSearch? It appears that the
integration between the two products does not work. Is there something
better to use than Nutch for crawling intranet sites that use ElasticSearch
for indexing?

Thanks,
DH

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(dhturner14) #2

I failed to mention the error I get...

bin/nutch elasticindex esc1 -all

Exception in thread "main" java.lang.RuntimeException: job failed:
name=elastic-index [esc1], jobid=job_local1244330097_0001
at
org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:54)
at
org.apache.nutch.indexer.elastic.ElasticIndexerJob.run(ElasticIndexerJob.java:52)
at
org.apache.nutch.indexer.elastic.ElasticIndexerJob.indexElastic(ElasticIndexerJob.java:60)
at
org.apache.nutch.indexer.elastic.ElasticIndexerJob.run(ElasticIndexerJob.java:73)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at
org.apache.nutch.indexer.elastic.ElasticIndexerJob.main(ElasticIndexerJob.java:78)

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Martijn Van Groningen) #3

I don't see any ES specific error. I think this question should go to the
Nutch mailing list, have tried to get answers there?

On 9 September 2013 19:06, dhturner14@gmail.com wrote:

I failed to mention the error I get...

bin/nutch elasticindex esc1 -all

Exception in thread "main" java.lang.RuntimeException: job failed:
name=elastic-index [esc1], jobid=job_local1244330097_0001
at
org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:54)
at
org.apache.nutch.indexer.elastic.ElasticIndexerJob.run(ElasticIndexerJob.java:52)
at
org.apache.nutch.indexer.elastic.ElasticIndexerJob.indexElastic(ElasticIndexerJob.java:60)
at
org.apache.nutch.indexer.elastic.ElasticIndexerJob.run(ElasticIndexerJob.java:73)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at
org.apache.nutch.indexer.elastic.ElasticIndexerJob.main(ElasticIndexerJob.java:78)

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Met vriendelijke groet,

Martijn van Groningen

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jörg Prante) #4

Increase the nutch log level so you can see more information. I guess wild,
you were runing out of memory, or out of file descriptors.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #5