Issue when indexing to elasticsearch from apache nutch

Sachin_Shaju · August 2, 2016, 6:44am

I was trying to index from apache nutch to single node ES cluster and got this error.

org.elasticsearch.transport.RemoteTransportException: Failed to deserialize exception response from stream Caused by: org.elasticsearch.transport.TransportSerializationException: Failed to deserialize exception response from stream at org.elasticsearch.transport.netty.MessageChannelHandler.handlerResponseError(MessageChannelHandler.java:173) at org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:125) at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:296) at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462) at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443) at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303) at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559) at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:268) at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:255) at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88) at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108) at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318) at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89) at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.StreamCorruptedException: Unsupported version: 1 at org.elasticsearch.common.io.ThrowableObjectInputStream.readStreamHeader(ThrowableObjectInputStream.java:46) at java.io.ObjectInputStream.(ObjectInputStream.java:301) at org.elasticsearch.common.io.ThrowableObjectInputStream.(ThrowableObjectInputStream.java:38) at org.elasticsearch.transport.netty.MessageChannelHandler.handlerResponseError(MessageChannelHandler.java:170) ... 23 more

From further research I came to know I should have same jvm version on client and ES server. Reference : http://jontai.me/blog/2013/06/elasticsearch-remotetransportexception-failed-to-deserialize-exception-response-from-stream/

I'm using ES version 2.3.2 and my JVM version is "1.8.0_91". When I checked /plugins/indexer-elastic/plugin.xml,the version specified is 1.4.1. I would like to know this could be the issue and a possible solution other than to downgrade ES cluster version. I would like to continue with ES 2.3.2. Please help me on this.

PS : The command I've used for indexing is bin/nutch index crawl/crawldb/ -linkdb crawl/linkdb/ crawl/segments/20160801174223/

jprante · August 2, 2016, 7:33am

This is a question for the Nutch community.

You have to build Nutch from source (master branch). It support ES 2.3.3 https://github.com/apache/nutch/blob/master/src/plugin/indexer-elastic/plugin.xml

Sachin_Shaju · August 2, 2016, 8:27am

I asked the question here because the exception was specific to elasticsearch. Thanks for your reply

Sachin_Shaju · August 2, 2016, 12:02pm

It worked. Thanks @jprante

Nestor · September 27, 2016, 3:35pm

Hi Sachin,
Do you have any info in how you made work? I am trying to hook nutch 1.12 and Elasticsearch 2.4. My website is crawled, I edited the nutch-site.xml. I can see info in port 9200. I just do not know how to see the data. or how to configure fields to display. Any examples?

Thanks,

Nestor

Sachin_Shaju · September 28, 2016, 4:16am

Have you tried crawl script in nutch as bin/crawl -i urls/ CrawlDir/ 1 to crawl and index a site ?

Topic		Replies	Views
TransportSerializationException: Failed to deserialize exception response from stream Elasticsearch	11	1757	July 6, 2017
After upgrade to elastic search 1.2.1 getting org.elasticsearch.transport.RemoteTransportException: Failed to deserialize response of type [org.elasticsearch.action.admin.cluster.node.info.NodesInfoResponse] Elasticsearch	5	682	July 6, 2017
Exception while searching from Java Elasticsearch	8	876	July 6, 2017
Error java.lang.IndexOutOfBoundsException with Elasticsearch 1.4.2 version Elasticsearch	1	896	July 6, 2017
Exceptions Elasticsearch	5	364	July 6, 2017

Issue when indexing to elasticsearch from apache nutch

Related topics