bulk indexing and TransportSerializationException

John_Bush_2 · February 28, 2013, 9:19pm

We are using ES to index larger docs of content typically pdf, docx
etc. The average size is around typically less that 1MB in size. Our
clients vary but typically we are going to need to index about 100-200
GB of this type of data, and then after that everything is real time.

I'm running ES embedded and once I kick off a reindex all the nodes
participate in digested the binary content into a string format that
gets fed to ES. The bulk of the work is really in the digesting on
our side. But I'm wondering if maybe I should look into using the
bulk api. After the initial setup we really wont' be doing much bulk
loading, but based on my timings the initial load may take 6 hours or
more, so any speed up would be great.

Also I was seeing a bunch of these types of errors

2013-02-18 12:08:43,443 WARN
elasticsearch[server2_ec2-204-236-163-41][generic][T#1043]
org.elasticsearch.cluster.action.shard - [server2_ec2-204-236-163-41]
sending failed shard for [sakai_index][1],
node[dmWqaCe0S4adiEAD_043qA], [R], s[INITIALIZING], reason [Failed to
start shard, message [RecoveryFailedException[[sakai_index][1]:
Recovery failed from
[server4_ec2-50-18-148-126][BQx-NOWeRG2uxTz0v2xi1w][inet[/10.171.159.235:9300]]{local=false}
into [server2_ec2-204-236-163-41][dmWqaCe0S4adiEAD_043qA][inet[/204.236.163.41:9300]]{local=false}];
nested: RemoteTransportException[Failed to deserialize exception
response from stream]; nested: TransportSerializationException[Failed
to deserialize exception response from stream]; nested:
InvalidClassException[failed to read class descriptor]; nested:
ClassNotFoundException[org.elasticsearch.transport.RemoteTransportException];
]]

2013-02-28 14:17:43,742 WARN
elasticsearch[server4_ec2-50-18-148-126][transport_client_worker][T#4]{New
I/O worker #4} org.elasticsearch.transport.netty -
[server4_ec2-50-18-148-126] Message not fully read (response) for
[17684] handler
future(org.elasticsearch.indices.recovery.RecoveryTarget$4@53a60164),
error [true], resetting

Which when I google says there is a version mismatch. I've doubled
check that and that's not the problem. I saw one issue in 0.20.5 that
looks like it might be related to this, upgraded and I'm still having
this issues.

I was doing a bunch of refresh and flush calls during my indexing,
from the research I've done I gather its best to just let ES do that
on its own. So I removed those and set these index properties:

"translog.flush_threshold_period" : "5s",
"refresh_interval" : "5s",

Those problems went away for a little longer but now are back again.
Would manual refresh cause that? I'm wondering if I was simply
causing so many merges that things were essentially stepping on each
other. Any ideas how what might cause this ?

--
John Bush
602-490-0470

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

jprante · February 28, 2013, 9:46pm

Do you use plugins, and same plugin versions on all nodes? Also on the
(Transport)Client?
Do you mix Elastiscearch versions?
Are you sure you run the same Java JVM version on all nodes in the
cluster, and also on the (Transport)Client?

Explanation: "TransportSerializationException[Failed to deserialize
exception response from stream]; nested: InvalidClassException[failed to
read class descriptor]; nested:
ClassNotFoundException[org.elasticsearch.transport.RemoteTransportException"
is logged if you have nodes in the cluster that fail to read encoded
Java classes on the wire.

Possible reasons:

Elasticsearch version mismatch between cluster nodes, in the case
exception classes have been refactored, it gives fatal messages
missing plugin code on a node, and when plugins throw custom
exceptions, they can't get transported to the node where the plugin is
not installed
or you have JVM versions running that are incomptible to each other,
for example, mixing Java 6 and 7 JVMs will not work together when
classes are transported in the object input stream used on the netty layer

Flush/refresh actions do not hurt that much, they should not throw
exceptions, although 5s is a little short in my understanding.

Jörg

Am 28.02.13 22:19, schrieb John Bush:

We are using ES to index larger docs of content typically pdf, docx
etc. The average size is around typically less that 1MB in size. Our
clients vary but typically we are going to need to index about 100-200
GB of this type of data, and then after that everything is real time.

I'm running ES embedded and once I kick off a reindex all the nodes
participate in digested the binary content into a string format that
gets fed to ES. The bulk of the work is really in the digesting on
our side. But I'm wondering if maybe I should look into using the
bulk api. After the initial setup we really wont' be doing much bulk
loading, but based on my timings the initial load may take 6 hours or
more, so any speed up would be great.

Also I was seeing a bunch of these types of errors

2013-02-18 12:08:43,443 WARN
elasticsearch[server2_ec2-204-236-163-41][generic][T#1043]
org.elasticsearch.cluster.action.shard - [server2_ec2-204-236-163-41]
sending failed shard for [sakai_index][1],
node[dmWqaCe0S4adiEAD_043qA], [R], s[INITIALIZING], reason [Failed to
start shard, message [RecoveryFailedException[[sakai_index][1]:
Recovery failed from
[server4_ec2-50-18-148-126][BQx-NOWeRG2uxTz0v2xi1w][inet[/10.171.159.235:9300]]{local=false}
into [server2_ec2-204-236-163-41][dmWqaCe0S4adiEAD_043qA][inet[/204.236.163.41:9300]]{local=false}];
nested: RemoteTransportException[Failed to deserialize exception
response from stream]; nested: TransportSerializationException[Failed
to deserialize exception response from stream]; nested:
InvalidClassException[failed to read class descriptor]; nested:
ClassNotFoundException[org.elasticsearch.transport.RemoteTransportException];
]]

2013-02-28 14:17:43,742 WARN
elasticsearch[server4_ec2-50-18-148-126][transport_client_worker][T#4]{New
I/O worker #4} org.elasticsearch.transport.netty -
[server4_ec2-50-18-148-126] Message not fully read (response) for
[17684] handler
future(org.elasticsearch.indices.recovery.RecoveryTarget$4@53a60164),
error [true], resetting

Which when I google says there is a version mismatch. I've doubled
check that and that's not the problem. I saw one issue in 0.20.5 that
looks like it might be related to this, upgraded and I'm still having
this issues.

I was doing a bunch of refresh and flush calls during my indexing,
from the research I've done I gather its best to just let ES do that
on its own. So I removed those and set these index properties:
 "translog.flush_threshold_period" : "5s",
 "refresh_interval" : "5s",
Those problems went away for a little longer but now are back again.
Would manual refresh cause that? I'm wondering if I was simply
causing so many merges that things were essentially stepping on each
other. Any ideas how what might cause this ?

--
John Bush
602-490-0470

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

John_Bush · February 28, 2013, 10:25pm

All those nodes are using a nfs share of the exact same elasticsearch jars.
I'm not sure plugins and since they all pointing to that same stuff that's
not it. These 4 nodes were cloned from the same instance in aws, same java
version everywhere. To me it seems like maybe there's some system/network
thing causing problems. Maybe node is getting partial data and that is why
the serialization errors ? It seems to happen consistently after about
15-20 minutes of indexing.

On Thursday, February 28, 2013 2:46:02 PM UTC-7, Jörg Prante wrote:

Do you use plugins, and same plugin versions on all nodes? Also on the
(Transport)Client?
Do you mix Elastiscearch versions?
Are you sure you run the same Java JVM version on all nodes in the
cluster, and also on the (Transport)Client?

Explanation: "TransportSerializationException[Failed to deserialize
exception response from stream]; nested: InvalidClassException[failed to
read class descriptor]; nested:
ClassNotFoundException[org.elasticsearch.transport.RemoteTransportException"

is logged if you have nodes in the cluster that fail to read encoded
Java classes on the wire.

Possible reasons:

Elasticsearch version mismatch between cluster nodes, in the case
exception classes have been refactored, it gives fatal messages

missing plugin code on a node, and when plugins throw custom
exceptions, they can't get transported to the node where the plugin is
not installed

or you have JVM versions running that are incomptible to each other,
for example, mixing Java 6 and 7 JVMs will not work together when
classes are transported in the object input stream used on the netty layer

Flush/refresh actions do not hurt that much, they should not throw
exceptions, although 5s is a little short in my understanding.

Jörg

Am 28.02.13 22:19, schrieb John Bush:

We are using ES to index larger docs of content typically pdf, docx
etc. The average size is around typically less that 1MB in size. Our
clients vary but typically we are going to need to index about 100-200
GB of this type of data, and then after that everything is real time.

I'm running ES embedded and once I kick off a reindex all the nodes
participate in digested the binary content into a string format that
gets fed to ES. The bulk of the work is really in the digesting on
our side. But I'm wondering if maybe I should look into using the
bulk api. After the initial setup we really wont' be doing much bulk
loading, but based on my timings the initial load may take 6 hours or
more, so any speed up would be great.

Also I was seeing a bunch of these types of errors

2013-02-18 12:08:43,443 WARN
elasticsearch[server2_ec2-204-236-163-41][generic][T#1043]
org.elasticsearch.cluster.action.shard - [server2_ec2-204-236-163-41]
sending failed shard for [sakai_index][1],
node[dmWqaCe0S4adiEAD_043qA], [R], s[INITIALIZING], reason [Failed to
start shard, message [RecoveryFailedException[[sakai_index][1]:
Recovery failed from

[server4_ec2-50-18-148-126][BQx-NOWeRG2uxTz0v2xi1w][inet[/10.171.159.235:9300]]{local=false}

into
[server2_ec2-204-236-163-41][dmWqaCe0S4adiEAD_043qA][inet[/204.236.163.41:9300]]{local=false}];

nested: RemoteTransportException[Failed to deserialize exception
response from stream]; nested: TransportSerializationException[Failed
to deserialize exception response from stream]; nested:
InvalidClassException[failed to read class descriptor]; nested:

ClassNotFoundException[org.elasticsearch.transport.RemoteTransportException];

]]

2013-02-28 14:17:43,742 WARN

elasticsearch[server4_ec2-50-18-148-126][transport_client_worker][T#4]{New
I/O worker #4} org.elasticsearch.transport.netty -
[server4_ec2-50-18-148-126] Message not fully read (response) for
[17684] handler
future(org.elasticsearch.indices.recovery.RecoveryTarget$4@53a60164),
error [true], resetting

Which when I google says there is a version mismatch. I've doubled
check that and that's not the problem. I saw one issue in 0.20.5 that
looks like it might be related to this, upgraded and I'm still having
this issues.

I was doing a bunch of refresh and flush calls during my indexing,
from the research I've done I gather its best to just let ES do that
on its own. So I removed those and set these index properties:
 "translog.flush_threshold_period" : "5s", 
 "refresh_interval" : "5s", 
Those problems went away for a little longer but now are back again.
Would manual refresh cause that? I'm wondering if I was simply
causing so many merges that things were essentially stepping on each
other. Any ideas how what might cause this ?

--
John Bush
602-490-0470

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

jprante · February 28, 2013, 11:18pm

In case of incomplete network buffers, the error would be an IOException
in netty, something like "connection reset by peer". I think ES can't
handle one of your input documents, creates an internal exception, which
may be not serializable, and this breaks netty transport and is reported
back to your bulk indexing.

Jörg

Am 28.02.13 23:25, schrieb John Bush:

Maybe node is getting partial data and that is why the serialization
errors ? It seems to happen consistently after about 15-20 minutes of
indexing.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Bulk index with java rest client Elasticsearch	5	1295	February 14, 2018
Alternative bulk indexing implementations? Elasticsearch	10	2278	July 5, 2017
Bulkload performance issue Elasticsearch	2	378	September 14, 2019
Not able to index large csv files using java bulk api Elasticsearch	7	1792	June 22, 2019
Multi-threaded Bulk Indexing Elasticsearch	8	3376	July 6, 2017

bulk indexing and TransportSerializationException

Related topics