Attempt to index a large dataset fails

shlomivaknin · February 6, 2013, 12:17pm

Hey,

We are attempting to index about 32gb of raw data, using ES 0.20.4 and sun
java 1.6.0_37, on a single node "cluster".

After a while of bulk uploading (from the same node), we are getting

Log4jESLogger.java 389 [main] INFO org.elasticsearch.plugins - [Beta Ray
Bill] loaded [], sites []
Log4jESLogger.java 677935 [elasticsearch[Beta Ray Bill][generic][T#2]] INFO
org.elasticsearch.client.transport - [Beta Ray Bill] failed to get local
cluster state for [Ronan the
Accuser][2ybHUuyzRjCTPsJA7YM4cw][inet[/192.168.100.78:9300]],
disconnecting...
org.elasticsearch.transport.ReceiveTimeoutTransportException: [Ronan the
Accuser][inet[/192.168.100.78:9300]][cluster/state] request_id [264] timed
out after [5000ms]
at
org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:342)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Exception in thread "main"
org.elasticsearch.client.transport.NoNodeAvailableException: No node
available
at
org.elasticsearch.client.transport.TransportClientNodesService$RetryListener.onFailure(TransportClientNodesService.java:246)
at
org.elasticsearch.action.TransportActionNodeProxy$1.handleException(TransportActionNodeProxy.java:84)
at
org.elasticsearch.transport.TransportService$Adapter$2$1.run(TransportService.java:305)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

I checked the log files, and i get

[2013-02-06 13:57:52,577][WARN ][cluster.action.shard ] [Lady Mandarin]
sending failed shard for [full-sentences--1][0],
node[Xhr947gyTjS6T_zVrDbZKw], [P], s[STARTED], reason [master [Lady
Mandarin][Xhr947gyTjS6T_zVrDbZKw][inet[/192.168.100.78:9300]] marked shard
as started, but shard have not been created, mark shard as failed]
[2013-02-06 13:57:52,577][WARN ][cluster.action.shard ] [Lady Mandarin]
received shard failed for [full-sentences--1][0],
node[Xhr947gyTjS6T_zVrDbZKw], [P], s[STARTED], reason [master [Lady
Mandarin][Xhr947gyTjS6T_zVrDbZKw][inet[/192.168.100.78:9300]] marked shard
as started, but shard have not been created, mark shard as failed]

now, my index has two shards, together reaching 17.8gb. if i restart the
operation, it will work for a while but eventually will fail with the same
messages.

what am I doing wrong?

Thanks

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

dadoonet · February 6, 2013, 12:31pm

Do you use Java for your bulk requests?
How big is your bulk (how many docs you inject for each bulk request?)

--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 6 févr. 2013 à 13:17, shlomivaknin@gmail.com a écrit :

Hey,

We are attempting to index about 32gb of raw data, using ES 0.20.4 and sun java 1.6.0_37, on a single node "cluster".

After a while of bulk uploading (from the same node), we are getting

Log4jESLogger.java 389 [main] INFO org.elasticsearch.plugins - [Beta Ray Bill] loaded [], sites []
Log4jESLogger.java 677935 [elasticsearch[Beta Ray Bill][generic][T#2]] INFO org.elasticsearch.client.transport - [Beta Ray Bill] failed to get local cluster state for [Ronan the Accuser][2ybHUuyzRjCTPsJA7YM4cw][inet[/192.168.100.78:9300]], disconnecting...
org.elasticsearch.transport.ReceiveTimeoutTransportException: [Ronan the Accuser][inet[/192.168.100.78:9300]][cluster/state] request_id [264] timed out after [5000ms]
at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:342)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Exception in thread "main" org.elasticsearch.client.transport.NoNodeAvailableException: No node available
at org.elasticsearch.client.transport.TransportClientNodesService$RetryListener.onFailure(TransportClientNodesService.java:246)
at org.elasticsearch.action.TransportActionNodeProxy$1.handleException(TransportActionNodeProxy.java:84)
at org.elasticsearch.transport.TransportService$Adapter$2$1.run(TransportService.java:305)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

I checked the log files, and i get

[2013-02-06 13:57:52,577][WARN ][cluster.action.shard ] [Lady Mandarin] sending failed shard for [full-sentences--1][0], node[Xhr947gyTjS6T_zVrDbZKw], [P], s[STARTED], reason [master [Lady Mandarin][Xhr947gyTjS6T_zVrDbZKw][inet[/192.168.100.78:9300]] marked shard as started, but shard have not been created, mark shard as failed]
[2013-02-06 13:57:52,577][WARN ][cluster.action.shard ] [Lady Mandarin] received shard failed for [full-sentences--1][0], node[Xhr947gyTjS6T_zVrDbZKw], [P], s[STARTED], reason [master [Lady Mandarin][Xhr947gyTjS6T_zVrDbZKw][inet[/192.168.100.78:9300]] marked shard as started, but shard have not been created, mark shard as failed]

now, my index has two shards, together reaching 17.8gb. if i restart the operation, it will work for a while but eventually will fail with the same messages.

what am I doing wrong?

Thanks

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

shlomivaknin · February 6, 2013, 12:35pm

yes, i am using java's bulk requests. i push 5000 each time. each entry is
roughly 150 bytes long.

On Wednesday, February 6, 2013 2:31:58 PM UTC+2, David Pilato wrote:

Do you use Java for your bulk requests?
How big is your bulk (how many docs you inject for each bulk request?)

--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 6 févr. 2013 à 13:17, shlomi...@gmail.com <javascript:> a écrit :

Hey,

We are attempting to index about 32gb of raw data, using ES 0.20.4 and sun
java 1.6.0_37, on a single node "cluster".

After a while of bulk uploading (from the same node), we are getting

Log4jESLogger.java 389 [main] INFO org.elasticsearch.plugins - [Beta
Ray Bill] loaded , sites
Log4jESLogger.java 677935 [elasticsearch[Beta Ray Bill][generic][T#2]]
INFO org.elasticsearch.client.transport - [Beta Ray Bill] failed to get
local cluster state for [Ronan the
Accuser][2ybHUuyzRjCTPsJA7YM4cw][inet[/192.168.100.78:9300]],
disconnecting...
org.elasticsearch.transport.ReceiveTimeoutTransportException: [Ronan the
Accuser][inet[/192.168.100.78:9300]][cluster/state] request_id [264] timed
out after [5000ms]
at
org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:342)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Exception in thread "main"
org.elasticsearch.client.transport.NoNodeAvailableException: No node
available
at
org.elasticsearch.client.transport.TransportClientNodesService$RetryListener.onFailure(TransportClientNodesService.java:246)
at
org.elasticsearch.action.TransportActionNodeProxy$1.handleException(TransportActionNodeProxy.java:84)
at
org.elasticsearch.transport.TransportService$Adapter$2$1.run(TransportService.java:305)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

I checked the log files, and i get

[2013-02-06 13:57:52,577][WARN ][cluster.action.shard ] [Lady
Mandarin] sending failed shard for [full-sentences--1][0],
node[Xhr947gyTjS6T_zVrDbZKw], [P], s[STARTED], reason [master [Lady
Mandarin][Xhr947gyTjS6T_zVrDbZKw][inet[/192.168.100.78:9300]] marked
shard as started, but shard have not been created, mark shard as failed]
[2013-02-06 13:57:52,577][WARN ][cluster.action.shard ] [Lady
Mandarin] received shard failed for [full-sentences--1][0],
node[Xhr947gyTjS6T_zVrDbZKw], [P], s[STARTED], reason [master [Lady
Mandarin][Xhr947gyTjS6T_zVrDbZKw][inet[/192.168.100.78:9300]] marked
shard as started, but shard have not been created, mark shard as failed]

now, my index has two shards, together reaching 17.8gb. if i restart the
operation, it will work for a while but eventually will fail with the same
messages.

what am I doing wrong?

Thanks

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

dadoonet · February 6, 2013, 12:44pm

One common error, but I don't know if it's the case here, is to use the same Bulk previously created instead of recreating a new one with a prepareBulk().

Do you recreate bulk after each execution?

--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 6 févr. 2013 à 13:35, Shlomi shlomivaknin@gmail.com a écrit :

yes, i am using java's bulk requests. i push 5000 each time. each entry is roughly 150 bytes long.

On Wednesday, February 6, 2013 2:31:58 PM UTC+2, David Pilato wrote:

Do you use Java for your bulk requests?
How big is your bulk (how many docs you inject for each bulk request?)

--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 6 févr. 2013 à 13:17, shlomi...@gmail.com a écrit :

Hey,

We are attempting to index about 32gb of raw data, using ES 0.20.4 and sun java 1.6.0_37, on a single node "cluster".

After a while of bulk uploading (from the same node), we are getting

Log4jESLogger.java 389 [main] INFO org.elasticsearch.plugins - [Beta Ray Bill] loaded , sites
Log4jESLogger.java 677935 [elasticsearch[Beta Ray Bill][generic][T#2]] INFO org.elasticsearch.client.transport - [Beta Ray Bill] failed to get local cluster state for [Ronan the Accuser][2ybHUuyzRjCTPsJA7YM4cw][inet[/192.168.100.78:9300]], disconnecting...
org.elasticsearch.transport.ReceiveTimeoutTransportException: [Ronan the Accuser][inet[/192.168.100.78:9300]][cluster/state] request_id [264] timed out after [5000ms]
at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:342)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Exception in thread "main" org.elasticsearch.client.transport.NoNodeAvailableException: No node available
at org.elasticsearch.client.transport.TransportClientNodesService$RetryListener.onFailure(TransportClientNodesService.java:246)
at org.elasticsearch.action.TransportActionNodeProxy$1.handleException(TransportActionNodeProxy.java:84)
at org.elasticsearch.transport.TransportService$Adapter$2$1.run(TransportService.java:305)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

I checked the log files, and i get

[2013-02-06 13:57:52,577][WARN ][cluster.action.shard ] [Lady Mandarin] sending failed shard for [full-sentences--1][0], node[Xhr947gyTjS6T_zVrDbZKw], [P], s[STARTED], reason [master [Lady Mandarin][Xhr947gyTjS6T_zVrDbZKw][inet[/192.168.100.78:9300]] marked shard as started, but shard have not been created, mark shard as failed]
[2013-02-06 13:57:52,577][WARN ][cluster.action.shard ] [Lady Mandarin] received shard failed for [full-sentences--1][0], node[Xhr947gyTjS6T_zVrDbZKw], [P], s[STARTED], reason [master [Lady Mandarin][Xhr947gyTjS6T_zVrDbZKw][inet[/192.168.100.78:9300]] marked shard as started, but shard have not been created, mark shard as failed]

now, my index has two shards, together reaching 17.8gb. if i restart the operation, it will work for a while but eventually will fail with the same messages.

what am I doing wrong?

Thanks

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

shlomivaknin · February 6, 2013, 12:47pm

No, iam recreating a fresh bulk for each request..
On Feb 6, 2013 2:44 PM, "David Pilato" david@pilato.fr wrote:

One common error, but I don't know if it's the case here, is to use the
same Bulk previously created instead of recreating a new one with a
prepareBulk().

Do you recreate bulk after each execution?

--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 6 févr. 2013 à 13:35, Shlomi shlomivaknin@gmail.com a écrit :

yes, i am using java's bulk requests. i push 5000 each time. each entry
is roughly 150 bytes long.

On Wednesday, February 6, 2013 2:31:58 PM UTC+2, David Pilato wrote:

Do you use Java for your bulk requests?
How big is your bulk (how many docs you inject for each bulk request?)

--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 6 févr. 2013 à 13:17, shlomi...@gmail.com a écrit :

Hey,

We are attempting to index about 32gb of raw data, using ES 0.20.4 and
sun java 1.6.0_37, on a single node "cluster".

After a while of bulk uploading (from the same node), we are getting

Log4jESLogger.java 389 [main] INFO org.elasticsearch.plugins - [Beta
Ray Bill] loaded , sites
Log4jESLogger.java 677935 [elasticsearch[Beta Ray Bill][generic][T#2]]
INFO org.elasticsearch.client.transport - [Beta Ray Bill] failed to
get local cluster state for [Ronan the Accuser][
2ybHUuyzRjCTPsJA7YM4cw][inet[/**192.168.100.78:9300]], disconnecting...
org.elasticsearch.transport.ReceiveTimeoutTransportException: [Ronan
the Accuser][inet[/192.168.100.78:**9300]][cluster/state] request_id
[264] timed out after [5000ms]
at org.elasticsearch.transport.TransportService$
TimeoutHandler.run(**TransportService.java:342)
at java.util.concurrent.ThreadPoolExecutor$Worker.
runTask(ThreadPoolExecutor.**java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.**java:662)
Exception in thread "main" org.elasticsearch.client.**transport.**NoNodeAvailableException:
No node available
at org.elasticsearch.client.transport.
TransportClientNodesService$RetryListener.onFailure(
TransportClientNodesService.**java:246)
at org.elasticsearch.action.TransportActionNodeProxy$1.
handleException(**TransportActionNodeProxy.java:**84)
at org.elasticsearch.transport.TransportService$Adapter$2$1.
run(TransportService.java:305)
at java.util.concurrent.ThreadPoolExecutor$Worker.
runTask(ThreadPoolExecutor.**java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.**java:662)

I checked the log files, and i get

[2013-02-06 13:57:52,577][WARN ][cluster.action.shard ] [Lady
Mandarin] sending failed shard for [full-sentences--1][0],
node[Xhr947gyTjS6T_zVrDbZKw], [P], s[STARTED], reason [master [Lady
Mandarin][Xhr947gyTjS6T_**zVrDbZKw][inet[/192.168.100.**78:9300http://192.168.100.78:9300]]
marked shard as started, but shard have not been created, mark shard as
failed]
[2013-02-06 13:57:52,577][WARN ][cluster.action.shard ] [Lady
Mandarin] received shard failed for [full-sentences--1][0],
node[Xhr947gyTjS6T_zVrDbZKw], [P], s[STARTED], reason [master [Lady
Mandarin][Xhr947gyTjS6T_**zVrDbZKw][inet[/192.168.100.**78:9300http://192.168.100.78:9300]]
marked shard as started, but shard have not been created, mark shard as
failed]

now, my index has two shards, together reaching 17.8gb. if i restart the
operation, it will work for a while but eventually will fail with the same
messages.

what am I doing wrong?

Thanks

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@**googlegroups.com.
For more options, visit https://groups.google.com/**groups/opt_out https://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

jprante · February 6, 2013, 12:47pm

Do you wait for BulkResponses? How much BulkRequests do you send in
parallel?

Jörg

Am 06.02.13 13:35, schrieb Shlomi:

yes, i am using java's bulk requests. i push 5000 each time. each
entry is roughly 150 bytes long.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

shlomivaknin · February 6, 2013, 1:21pm

Hey,
I call actionGet on the bulk req, and checking with hasFailures. nothing
seems to fail..
I only send one at a time.

On Wednesday, February 6, 2013 2:47:41 PM UTC+2, Jörg Prante wrote:

Do you wait for BulkResponses? How much BulkRequests do you send in
parallel?

Jörg

Am 06.02.13 13:35, schrieb Shlomi:

yes, i am using java's bulk requests. i push 5000 each time. each
entry is roughly 150 bytes long.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

jprante · February 6, 2013, 1:31pm

If you send one BulkRequest at a time, but don't examine the
corresponding BulkResponse, you have no control over the bulk indexing
at all. The 5000ms timeout is reached because the cluster sooner or
later will be busy for 5 secs which is often the case when segment
merging steps in.

To overcome this, you can use the BulkProcessor helper class, that gets
control over the BulkResponses

https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/action/bulk/BulkProcessor.java

For an example usage, look at the JDBC river

https://github.com/jprante/elasticsearch-river-jdbc/blob/master/src/main/java/org/elasticsearch/river/jdbc/strategy/simple/SimpleRiverMouth.java

Jörg

Am 06.02.13 14:21, schrieb Shlomi:

Hey,
I call actionGet on the bulk req, and checking with hasFailures.
nothing seems to fail..
I only send one at a time.

On Wednesday, February 6, 2013 2:47:41 PM UTC+2, Jörg Prante wrote:
Do you wait for BulkResponses? How much BulkRequests do you send in
parallel?

Jörg

Am 06.02.13 13:35, schrieb Shlomi:
>
> yes, i am using java's bulk requests. i push 5000 each time. each
> entry is roughly 150 bytes long.
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

shlomivaknin · February 6, 2013, 1:45pm

ok, ill try that, thanks!

On Wednesday, February 6, 2013 3:31:12 PM UTC+2, Jörg Prante wrote:

If you send one BulkRequest at a time, but don't examine the
corresponding BulkResponse, you have no control over the bulk indexing
at all. The 5000ms timeout is reached because the cluster sooner or
later will be busy for 5 secs which is often the case when segment
merging steps in.

To overcome this, you can use the BulkProcessor helper class, that gets
control over the BulkResponses

https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/action/bulk/BulkProcessor.java

For an example usage, look at the JDBC river

https://github.com/jprante/elasticsearch-river-jdbc/blob/master/src/main/java/org/elasticsearch/river/jdbc/strategy/simple/SimpleRiverMouth.java

Jörg

Am 06.02.13 14:21, schrieb Shlomi:
Hey,
I call actionGet on the bulk req, and checking with hasFailures.
nothing seems to fail..
I only send one at a time.

On Wednesday, February 6, 2013 2:47:41 PM UTC+2, Jörg Prante wrote:
Do you wait for BulkResponses? How much BulkRequests do you send in 
parallel? 

Jörg 

Am 06.02.13 13:35, schrieb Shlomi: 
> 
> yes, i am using java's bulk requests. i push 5000 each time. each 
> entry is roughly 150 bytes long. 
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

shlomivaknin · February 12, 2013, 10:57am

Hey,

Ok, so i tried that, and initially it worked fine, until my index has
reached a certain size. at that time uploading became super slow (couple of
hundreds per seconds) until it failed with a bunch of
NoNodeAvailableExceptions.
here is its size and docs count:
size: 166.2gb (166.2gb)
docs: 627977174 (627977174)

even restarting the bulk-uploader from scratch on that index gave me errors
after a very short period of time. uploading to a different index seems to
work fine.

any clues?

On Wed, Feb 6, 2013 at 3:45 PM, Shlomi shlomivaknin@gmail.com wrote:

ok, ill try that, thanks!

On Wednesday, February 6, 2013 3:31:12 PM UTC+2, Jörg Prante wrote:
If you send one BulkRequest at a time, but don't examine the
corresponding BulkResponse, you have no control over the bulk indexing
at all. The 5000ms timeout is reached because the cluster sooner or
later will be busy for 5 secs which is often the case when segment
merging steps in.

To overcome this, you can use the BulkProcessor helper class, that gets
control over the BulkResponses

https://github.com/**elasticsearch/elasticsearch/**
blob/master/src/main/java/org/elasticsearch/action/bulk/
BulkProcessor.javahttps://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/action/bulk/BulkProcessor.java

For an example usage, look at the JDBC river

https://github.com/jprante/**elasticsearch-river-jdbc/blob/**
master/src/main/java/org/**elasticsearch/river/jdbc/strategy/simple/
SimpleRiverMouth.javahttps://github.com/jprante/elasticsearch-river-jdbc/blob/master/src/main/java/org/elasticsearch/river/jdbc/strategy/simple/SimpleRiverMouth.java

Jörg

Am 06.02.13 14:21, schrieb Shlomi:
Hey,
I call actionGet on the bulk req, and checking with hasFailures.
nothing seems to fail..
I only send one at a time.

On Wednesday, February 6, 2013 2:47:41 PM UTC+2, Jörg Prante wrote:
Do you wait for BulkResponses? How much BulkRequests do you send in
parallel?

Jörg

Am 06.02.13 13:35, schrieb Shlomi:
>
> yes, i am using java's bulk requests. i push 5000 each time.
each
> entry is roughly 150 bytes long.
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@**googlegroups.com.
For more options, visit https://groups.google.com/**groups/opt_out https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

jprante · February 12, 2013, 1:01pm

Have you checked your logs? Have you enabled JVM monitoring? What is
your heap setting?

I think you will notice that your memory resources are exhausted. The
worst case, it is possible you encountered an OutOfMemoryException,
which is critical to the index health.

There are some options: increase JVM heap, increase number of nodes,
tune segment merges to smaller segments (and longer indexing times).

Best regards,

Jörg

Am 12.02.13 11:57, schrieb Shlomi Vaknin:

Hey,

Ok, so i tried that, and initially it worked fine, until my index has
reached a certain size. at that time uploading became super slow
(couple of hundreds per seconds) until it failed with a bunch of
NoNodeAvailableExceptions.
here is its size and docs count:
size: 166.2gb (166.2gb)
docs: 627977174 (627977174)

even restarting the bulk-uploader from scratch on that index gave me
errors after a very short period of time. uploading to a different
index seems to work fine.

any clues?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

shlomivaknin · February 19, 2013, 2:57pm

yep that was it.
thanks a lot, it all works perfectly well now.

On Tuesday, February 12, 2013 3:01:18 PM UTC+2, Jörg Prante wrote:

Have you checked your logs? Have you enabled JVM monitoring? What is
your heap setting?

I think you will notice that your memory resources are exhausted. The
worst case, it is possible you encountered an OutOfMemoryException,
which is critical to the index health.

There are some options: increase JVM heap, increase number of nodes,
tune segment merges to smaller segments (and longer indexing times).

Best regards,

Jörg

Am 12.02.13 11:57, schrieb Shlomi Vaknin:

Hey,

Ok, so i tried that, and initially it worked fine, until my index has
reached a certain size. at that time uploading became super slow
(couple of hundreds per seconds) until it failed with a bunch of
NoNodeAvailableExceptions.
here is its size and docs count:
size: 166.2gb (166.2gb)
docs: 627977174 (627977174)

even restarting the bulk-uploader from scratch on that index gave me
errors after a very short period of time. uploading to a different
index seems to work fine.

any clues?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Occational client.transport.NoNodeAvailableException Elasticsearch	5	424	July 6, 2017
ES bulk insert time out Elasticsearch	9	12599	July 6, 2017
Improving Bulk Indexing Elasticsearch	12	4602	July 6, 2017
Issue Indexing 50mil Docs via Bulk API Elasticsearch	23	2454	July 5, 2017
Help on ---- remote server bulk load Elasticsearch	17	712	July 6, 2017

Attempt to index a large dataset fails

Related topics