Issue when indexing to elasticsearch from apache nutch

I was trying to index from apache nutch to single node ES cluster and got this error.

org.elasticsearch.transport.RemoteTransportException: Failed to deserialize exception response from stream Caused by: org.elasticsearch.transport.TransportSerializationException: Failed to deserialize exception response from stream at org.elasticsearch.transport.netty.MessageChannelHandler.handlerResponseError( at org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived( at at at$DefaultChannelHandlerContext.sendUpstream( at at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived( at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.callDecode( at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.messageReceived( at at at at at at at at at at at at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$ at java.util.concurrent.ThreadPoolExecutor.runWorker( at java.util.concurrent.ThreadPoolExecutor$ at Caused by: Unsupported version: 1 at at at at org.elasticsearch.transport.netty.MessageChannelHandler.handlerResponseError( ... 23 more

From further research I came to know I should have same jvm version on client and ES server. Reference :

I'm using ES version 2.3.2 and my JVM version is "1.8.0_91". When I checked /plugins/indexer-elastic/plugin.xml,the version specified is 1.4.1. I would like to know this could be the issue and a possible solution other than to downgrade ES cluster version. I would like to continue with ES 2.3.2. Please help me on this.

PS : The command I've used for indexing is bin/nutch index crawl/crawldb/ -linkdb crawl/linkdb/ crawl/segments/20160801174223/

This is a question for the Nutch community.

You have to build Nutch from source (master branch). It support ES 2.3.3


I asked the question here because the exception was specific to elasticsearch. Thanks for your reply :slight_smile:

It worked. Thanks @jprante :slight_smile:

Hi Sachin,
Do you have any info in how you made work? I am trying to hook nutch 1.12 and Elasticsearch 2.4. My website is crawled, I edited the nutch-site.xml. I can see info in port 9200. I just do not know how to see the data. or how to configure fields to display. Any examples?



Have you tried crawl script in nutch as bin/crawl -i urls/ CrawlDir/ 1 to crawl and index a site ?