Rejected execution (queue capacity 50) in bulk process


(Jose Gargallo) #1

Hello,

I'm facing a problem bulk indexing 5k documents in 24 different indices
(i18n). I'm using elasticsearch 1.0.1 with all default settings. I've read
that a thread per index is used, that would mean I'm using 24 bulk threads
at one time. Am I right? if so, why I'm getting this rejection since queue
capacity is 50? It's possible that replicas consume threads as well? or a
thread per shard is used? this is my _cluster/health:

{
"cluster_name": "elasticsearch",
"status": "yellow",
"timed_out": false,
"number_of_nodes": 1,
"number_of_data_nodes": 1,
"active_primary_shards": 120,
"active_shards": 120,
"relocating_shards": 0,
"initializing_shards": 0,
"unassigned_shards": 120
}

120 shards = 24 indices * 5 shard

I've tried splitting the bulk indexing in different chunk sizes with same
result (as far as i understand this is not the solution).

Am I doing something wrong or I just have to increase the queue size?

I will appreciate any help

thanks

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/88614f79-e1df-420a-b471-fb0eedb9baa9%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jörg Prante) #2

Threads are pooled, they are not used per index.

The queue length of 50 works in almost any case. 50 is also safe to protect
a node before being overwhelmed by too many documents. If not, think about
the bulk request size, and if your cluster is powerful enough for
processing the transmitted documents. You can add 1000-10000 requests in a
single bulk request.

If you still exhaust the bulk queue size, you should examine your code if
you really examine the bulk responses of each bulk request, and if your
bulk client limits the concurrency of bulk requests. Look at the
BulkProcessor source code to learn about concurrent bulk requests.

Jörg

On Tue, Mar 4, 2014 at 4:45 PM, jgargallo jgargallo@gmail.com wrote:

Hello,

I'm facing a problem bulk indexing 5k documents in 24 different indices
(i18n). I'm using elasticsearch 1.0.1 with all default settings. I've read
that a thread per index is used, that would mean I'm using 24 bulk threads
at one time. Am I right? if so, why I'm getting this rejection since queue
capacity is 50? It's possible that replicas consume threads as well? or a
thread per shard is used? this is my _cluster/health:

{
"cluster_name": "elasticsearch",
"status": "yellow",
"timed_out": false,
"number_of_nodes": 1,
"number_of_data_nodes": 1,
"active_primary_shards": 120,
"active_shards": 120,
"relocating_shards": 0,
"initializing_shards": 0,
"unassigned_shards": 120
}

120 shards = 24 indices * 5 shard

I've tried splitting the bulk indexing in different chunk sizes with same
result (as far as i understand this is not the solution).

Am I doing something wrong or I just have to increase the queue size?

I will appreciate any help

thanks

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/88614f79-e1df-420a-b471-fb0eedb9baa9%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoH2aRoc58Luz1Y4o-v73q1-v6UwV0NWnEq8YY_Oj4CtrA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jose Gargallo) #3

I understand, but I must be missing something.

5k documents * 24 indices = 120k requests, How am i supposed to bulk index
them? I've tried to set "index.refresh_interval" to "-1" to speed up the
process but still same result. splitting the bulk in different sizes didn't
work either.

On 4 March 2014 17:55, joergprante@gmail.com joergprante@gmail.com wrote:

Threads are pooled, they are not used per index.

The queue length of 50 works in almost any case. 50 is also safe to
protect a node before being overwhelmed by too many documents. If not,
think about the bulk request size, and if your cluster is powerful enough
for processing the transmitted documents. You can add 1000-10000 requests
in a single bulk request.

If you still exhaust the bulk queue size, you should examine your code if
you really examine the bulk responses of each bulk request, and if your
bulk client limits the concurrency of bulk requests. Look at the
BulkProcessor source code to learn about concurrent bulk requests.

Jörg

On Tue, Mar 4, 2014 at 4:45 PM, jgargallo jgargallo@gmail.com wrote:

Hello,

I'm facing a problem bulk indexing 5k documents in 24 different indices
(i18n). I'm using elasticsearch 1.0.1 with all default settings. I've read
that a thread per index is used, that would mean I'm using 24 bulk threads
at one time. Am I right? if so, why I'm getting this rejection since queue
capacity is 50? It's possible that replicas consume threads as well? or a
thread per shard is used? this is my _cluster/health:

{
"cluster_name": "elasticsearch",
"status": "yellow",
"timed_out": false,
"number_of_nodes": 1,
"number_of_data_nodes": 1,
"active_primary_shards": 120,
"active_shards": 120,
"relocating_shards": 0,
"initializing_shards": 0,
"unassigned_shards": 120
}

120 shards = 24 indices * 5 shard

I've tried splitting the bulk indexing in different chunk sizes with same
result (as far as i understand this is not the solution).

Am I doing something wrong or I just have to increase the queue size?

I will appreciate any help

thanks

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/88614f79-e1df-420a-b471-fb0eedb9baa9%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/AemPEOkDvEE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoH2aRoc58Luz1Y4o-v73q1-v6UwV0NWnEq8YY_Oj4CtrA%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPKnLaOW1UAk7SiLfO%3D912dSi56ahYazzNjEPGMAKyeR%2BLkzwQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jörg Prante) #4

Without being able to look at source code, it is difficult if not
impossible to find issues.

"index.refresh_interval: -1" must be set to the respective index,
preferably using the cluster update API (conf file or index creation
settings is also possible but not a good place for temporary settings). It
does not "speed up" bulk indexing, it saves some resources, the node does
not refresh that often for index reads, which is often a heavy operation.
With only 120k docs, distributed over 24 indices, the effect may not be
visible at all. Default flush buffer size for an index is 64k IIRC.

Jörg

On Tue, Mar 4, 2014 at 6:12 PM, Jose Gargallo jgargallo@gmail.com wrote:

I understand, but I must be missing something.

5k documents * 24 indices = 120k requests, How am i supposed to bulk
index them? I've tried to set "index.refresh_interval" to "-1" to speed up
the process but still same result. splitting the bulk in different sizes
didn't work either.

On 4 March 2014 17:55, joergprante@gmail.com joergprante@gmail.comwrote:

Threads are pooled, they are not used per index.

The queue length of 50 works in almost any case. 50 is also safe to
protect a node before being overwhelmed by too many documents. If not,
think about the bulk request size, and if your cluster is powerful enough
for processing the transmitted documents. You can add 1000-10000 requests
in a single bulk request.

If you still exhaust the bulk queue size, you should examine your code if
you really examine the bulk responses of each bulk request, and if your
bulk client limits the concurrency of bulk requests. Look at the
BulkProcessor source code to learn about concurrent bulk requests.

Jörg

On Tue, Mar 4, 2014 at 4:45 PM, jgargallo jgargallo@gmail.com wrote:

Hello,

I'm facing a problem bulk indexing 5k documents in 24 different indices
(i18n). I'm using elasticsearch 1.0.1 with all default settings. I've read
that a thread per index is used, that would mean I'm using 24 bulk threads
at one time. Am I right? if so, why I'm getting this rejection since queue
capacity is 50? It's possible that replicas consume threads as well? or a
thread per shard is used? this is my _cluster/health:

{
"cluster_name": "elasticsearch",
"status": "yellow",
"timed_out": false,
"number_of_nodes": 1,
"number_of_data_nodes": 1,
"active_primary_shards": 120,
"active_shards": 120,
"relocating_shards": 0,
"initializing_shards": 0,
"unassigned_shards": 120
}

120 shards = 24 indices * 5 shard

I've tried splitting the bulk indexing in different chunk sizes with
same result (as far as i understand this is not the solution).

Am I doing something wrong or I just have to increase the queue size?

I will appreciate any help

thanks

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/88614f79-e1df-420a-b471-fb0eedb9baa9%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/AemPEOkDvEE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoH2aRoc58Luz1Y4o-v73q1-v6UwV0NWnEq8YY_Oj4CtrA%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAPKnLaOW1UAk7SiLfO%3D912dSi56ahYazzNjEPGMAKyeR%2BLkzwQ%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEdv4usqhoSe32w-sqz3ZBSn2LwKTH2BqanUU4t70nE%3Dw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jose Gargallo) #5

This is what I'm doing:

for each 24 locales:
POST /current_contenidos_[LOCALE] # creates index
PUT /current_contenidos_es_es/_settings # sets refresh_interval = -1

POST /_bulk HTTP/1.1" 200 2386284 (12 times [mixed requests for 24 indices])

for each 24 locales:
PUT /contenidos_es_es/_settings # sets back refresh_interval = 1s

On 4 March 2014 18:43, joergprante@gmail.com joergprante@gmail.com wrote:

Without being able to look at source code, it is difficult if not
impossible to find issues.

"index.refresh_interval: -1" must be set to the respective index,
preferably using the cluster update API (conf file or index creation
settings is also possible but not a good place for temporary settings). It
does not "speed up" bulk indexing, it saves some resources, the node does
not refresh that often for index reads, which is often a heavy operation.
With only 120k docs, distributed over 24 indices, the effect may not be
visible at all. Default flush buffer size for an index is 64k IIRC.

Jörg

On Tue, Mar 4, 2014 at 6:12 PM, Jose Gargallo jgargallo@gmail.com wrote:

I understand, but I must be missing something.

5k documents * 24 indices = 120k requests, How am i supposed to bulk
index them? I've tried to set "index.refresh_interval" to "-1" to speed up
the process but still same result. splitting the bulk in different sizes
didn't work either.

On 4 March 2014 17:55, joergprante@gmail.com joergprante@gmail.comwrote:

Threads are pooled, they are not used per index.

The queue length of 50 works in almost any case. 50 is also safe to
protect a node before being overwhelmed by too many documents. If not,
think about the bulk request size, and if your cluster is powerful enough
for processing the transmitted documents. You can add 1000-10000 requests
in a single bulk request.

If you still exhaust the bulk queue size, you should examine your code
if you really examine the bulk responses of each bulk request, and if your
bulk client limits the concurrency of bulk requests. Look at the
BulkProcessor source code to learn about concurrent bulk requests.

Jörg

On Tue, Mar 4, 2014 at 4:45 PM, jgargallo jgargallo@gmail.com wrote:

Hello,

I'm facing a problem bulk indexing 5k documents in 24 different indices
(i18n). I'm using elasticsearch 1.0.1 with all default settings. I've read
that a thread per index is used, that would mean I'm using 24 bulk threads
at one time. Am I right? if so, why I'm getting this rejection since queue
capacity is 50? It's possible that replicas consume threads as well? or a
thread per shard is used? this is my _cluster/health:

{
"cluster_name": "elasticsearch",
"status": "yellow",
"timed_out": false,
"number_of_nodes": 1,
"number_of_data_nodes": 1,
"active_primary_shards": 120,
"active_shards": 120,
"relocating_shards": 0,
"initializing_shards": 0,
"unassigned_shards": 120
}

120 shards = 24 indices * 5 shard

I've tried splitting the bulk indexing in different chunk sizes with
same result (as far as i understand this is not the solution).

Am I doing something wrong or I just have to increase the queue size?

I will appreciate any help

thanks

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/88614f79-e1df-420a-b471-fb0eedb9baa9%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/AemPEOkDvEE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoH2aRoc58Luz1Y4o-v73q1-v6UwV0NWnEq8YY_Oj4CtrA%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAPKnLaOW1UAk7SiLfO%3D912dSi56ahYazzNjEPGMAKyeR%2BLkzwQ%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/AemPEOkDvEE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEdv4usqhoSe32w-sqz3ZBSn2LwKTH2BqanUU4t70nE%3Dw%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPKnLaOMyPrV5Uf0KZ98PVARGQFdc%2B_THGMZbbQygtVjNXv1zA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jörg Prante) #6

So you use plain HTTP API? Do you evaluate the responses from POST /_bulk
requests before sending next request?

Jörg

On Tue, Mar 4, 2014 at 6:59 PM, Jose Gargallo jgargallo@gmail.com wrote:

This is what I'm doing:

for each 24 locales:
POST /current_contenidos_[LOCALE] # creates index
PUT /current_contenidos_es_es/_settings # sets refresh_interval = -1

POST /_bulk HTTP/1.1" 200 2386284 (12 times [mixed requests for 24
indices])

for each 24 locales:
PUT /contenidos_es_es/_settings # sets back refresh_interval = 1s

On 4 March 2014 18:43, joergprante@gmail.com joergprante@gmail.comwrote:

Without being able to look at source code, it is difficult if not
impossible to find issues.

"index.refresh_interval: -1" must be set to the respective index,
preferably using the cluster update API (conf file or index creation
settings is also possible but not a good place for temporary settings). It
does not "speed up" bulk indexing, it saves some resources, the node does
not refresh that often for index reads, which is often a heavy operation.
With only 120k docs, distributed over 24 indices, the effect may not be
visible at all. Default flush buffer size for an index is 64k IIRC.

Jörg

On Tue, Mar 4, 2014 at 6:12 PM, Jose Gargallo jgargallo@gmail.comwrote:

I understand, but I must be missing something.

5k documents * 24 indices = 120k requests, How am i supposed to bulk
index them? I've tried to set "index.refresh_interval" to "-1" to speed up
the process but still same result. splitting the bulk in different sizes
didn't work either.

On 4 March 2014 17:55, joergprante@gmail.com joergprante@gmail.comwrote:

Threads are pooled, they are not used per index.

The queue length of 50 works in almost any case. 50 is also safe to
protect a node before being overwhelmed by too many documents. If not,
think about the bulk request size, and if your cluster is powerful enough
for processing the transmitted documents. You can add 1000-10000 requests
in a single bulk request.

If you still exhaust the bulk queue size, you should examine your code
if you really examine the bulk responses of each bulk request, and if your
bulk client limits the concurrency of bulk requests. Look at the
BulkProcessor source code to learn about concurrent bulk requests.

Jörg

On Tue, Mar 4, 2014 at 4:45 PM, jgargallo jgargallo@gmail.com wrote:

Hello,

I'm facing a problem bulk indexing 5k documents in 24 different
indices (i18n). I'm using elasticsearch 1.0.1 with all default settings.
I've read that a thread per index is used, that would mean I'm using 24
bulk threads at one time. Am I right? if so, why I'm getting this rejection
since queue capacity is 50? It's possible that replicas consume threads as
well? or a thread per shard is used? this is my _cluster/health:

{
"cluster_name": "elasticsearch",
"status": "yellow",
"timed_out": false,
"number_of_nodes": 1,
"number_of_data_nodes": 1,
"active_primary_shards": 120,
"active_shards": 120,
"relocating_shards": 0,
"initializing_shards": 0,
"unassigned_shards": 120
}

120 shards = 24 indices * 5 shard

I've tried splitting the bulk indexing in different chunk sizes with
same result (as far as i understand this is not the solution).

Am I doing something wrong or I just have to increase the queue size?

I will appreciate any help

thanks

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/88614f79-e1df-420a-b471-fb0eedb9baa9%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/AemPEOkDvEE/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoH2aRoc58Luz1Y4o-v73q1-v6UwV0NWnEq8YY_Oj4CtrA%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAPKnLaOW1UAk7SiLfO%3D912dSi56ahYazzNjEPGMAKyeR%2BLkzwQ%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/AemPEOkDvEE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEdv4usqhoSe32w-sqz3ZBSn2LwKTH2BqanUU4t70nE%3Dw%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAPKnLaOMyPrV5Uf0KZ98PVARGQFdc%2B_THGMZbbQygtVjNXv1zA%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFvXLcJp6%2Bz3RnLmTwQQa%3DX1Msaigcyw9E2zs8JBK_pCQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jose Gargallo) #7

I'm using python, I just sent the log so you could figure out what I'm
doing.

I'm not evaluating the bulk response but logging them so I can see the
'rejected execution' error in most of them.

On 4 March 2014 19:13, joergprante@gmail.com joergprante@gmail.com wrote:

So you use plain HTTP API? Do you evaluate the responses from POST /_bulk
requests before sending next request?

Jörg

On Tue, Mar 4, 2014 at 6:59 PM, Jose Gargallo jgargallo@gmail.com wrote:

This is what I'm doing:

for each 24 locales:
POST /current_contenidos_[LOCALE] # creates index
PUT /current_contenidos_es_es/_settings # sets refresh_interval = -1

POST /_bulk HTTP/1.1" 200 2386284 (12 times [mixed requests for 24
indices])

for each 24 locales:
PUT /contenidos_es_es/_settings # sets back refresh_interval = 1s

On 4 March 2014 18:43, joergprante@gmail.com joergprante@gmail.comwrote:

Without being able to look at source code, it is difficult if not
impossible to find issues.

"index.refresh_interval: -1" must be set to the respective index,
preferably using the cluster update API (conf file or index creation
settings is also possible but not a good place for temporary settings). It
does not "speed up" bulk indexing, it saves some resources, the node does
not refresh that often for index reads, which is often a heavy operation.
With only 120k docs, distributed over 24 indices, the effect may not be
visible at all. Default flush buffer size for an index is 64k IIRC.

Jörg

On Tue, Mar 4, 2014 at 6:12 PM, Jose Gargallo jgargallo@gmail.comwrote:

I understand, but I must be missing something.

5k documents * 24 indices = 120k requests, How am i supposed to bulk
index them? I've tried to set "index.refresh_interval" to "-1" to speed up
the process but still same result. splitting the bulk in different sizes
didn't work either.

On 4 March 2014 17:55, joergprante@gmail.com joergprante@gmail.comwrote:

Threads are pooled, they are not used per index.

The queue length of 50 works in almost any case. 50 is also safe to
protect a node before being overwhelmed by too many documents. If not,
think about the bulk request size, and if your cluster is powerful enough
for processing the transmitted documents. You can add 1000-10000 requests
in a single bulk request.

If you still exhaust the bulk queue size, you should examine your code
if you really examine the bulk responses of each bulk request, and if your
bulk client limits the concurrency of bulk requests. Look at the
BulkProcessor source code to learn about concurrent bulk requests.

Jörg

On Tue, Mar 4, 2014 at 4:45 PM, jgargallo jgargallo@gmail.com wrote:

Hello,

I'm facing a problem bulk indexing 5k documents in 24 different
indices (i18n). I'm using elasticsearch 1.0.1 with all default settings.
I've read that a thread per index is used, that would mean I'm using 24
bulk threads at one time. Am I right? if so, why I'm getting this rejection
since queue capacity is 50? It's possible that replicas consume threads as
well? or a thread per shard is used? this is my _cluster/health:

{
"cluster_name": "elasticsearch",
"status": "yellow",
"timed_out": false,
"number_of_nodes": 1,
"number_of_data_nodes": 1,
"active_primary_shards": 120,
"active_shards": 120,
"relocating_shards": 0,
"initializing_shards": 0,
"unassigned_shards": 120
}

120 shards = 24 indices * 5 shard

I've tried splitting the bulk indexing in different chunk sizes with
same result (as far as i understand this is not the solution).

Am I doing something wrong or I just have to increase the queue size?

I will appreciate any help

thanks

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearch+unsubscribe@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/88614f79-e1df-420a-b471-fb0eedb9baa9%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/AemPEOkDvEE/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoH2aRoc58Luz1Y4o-v73q1-v6UwV0NWnEq8YY_Oj4CtrA%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAPKnLaOW1UAk7SiLfO%3D912dSi56ahYazzNjEPGMAKyeR%2BLkzwQ%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/AemPEOkDvEE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEdv4usqhoSe32w-sqz3ZBSn2LwKTH2BqanUU4t70nE%3Dw%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAPKnLaOMyPrV5Uf0KZ98PVARGQFdc%2B_THGMZbbQygtVjNXv1zA%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/AemPEOkDvEE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFvXLcJp6%2Bz3RnLmTwQQa%3DX1Msaigcyw9E2zs8JBK_pCQ%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAKdsXoFvXLcJp6%2Bz3RnLmTwQQa%3DX1Msaigcyw9E2zs8JBK_pCQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPKnLaOm5UfqeYdRA54ObktgJ0_ZDRGmDiaT8uZmi311LqTRoA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jörg Prante) #8

Logging is not enough, you should care for the number of active requests
sent and the bulk request responses that came back. So you can control the
number of concurrent bulk requests that are active at a time, and if you do
so, you can limit this number, before exceeding the bulk queue size of 50.

Jörg

On Tue, Mar 4, 2014 at 7:19 PM, Jose Gargallo jgargallo@gmail.com wrote:

I'm using python, I just sent the log so you could figure out what I'm
doing.

I'm not evaluating the bulk response but logging them so I can see the
'rejected execution' error in most of them.

On 4 March 2014 19:13, joergprante@gmail.com joergprante@gmail.comwrote:

So you use plain HTTP API? Do you evaluate the responses from POST /_bulk
requests before sending next request?

Jörg

On Tue, Mar 4, 2014 at 6:59 PM, Jose Gargallo jgargallo@gmail.comwrote:

This is what I'm doing:

for each 24 locales:
POST /current_contenidos_[LOCALE] # creates index
PUT /current_contenidos_es_es/_settings # sets refresh_interval = -1

POST /_bulk HTTP/1.1" 200 2386284 (12 times [mixed requests for 24
indices])

for each 24 locales:
PUT /contenidos_es_es/_settings # sets back refresh_interval = 1s

On 4 March 2014 18:43, joergprante@gmail.com joergprante@gmail.comwrote:

Without being able to look at source code, it is difficult if not
impossible to find issues.

"index.refresh_interval: -1" must be set to the respective index,
preferably using the cluster update API (conf file or index creation
settings is also possible but not a good place for temporary settings). It
does not "speed up" bulk indexing, it saves some resources, the node does
not refresh that often for index reads, which is often a heavy operation.
With only 120k docs, distributed over 24 indices, the effect may not be
visible at all. Default flush buffer size for an index is 64k IIRC.

Jörg

On Tue, Mar 4, 2014 at 6:12 PM, Jose Gargallo jgargallo@gmail.comwrote:

I understand, but I must be missing something.

5k documents * 24 indices = 120k requests, How am i supposed to bulk
index them? I've tried to set "index.refresh_interval" to "-1" to speed up
the process but still same result. splitting the bulk in different sizes
didn't work either.

On 4 March 2014 17:55, joergprante@gmail.com joergprante@gmail.comwrote:

Threads are pooled, they are not used per index.

The queue length of 50 works in almost any case. 50 is also safe to
protect a node before being overwhelmed by too many documents. If not,
think about the bulk request size, and if your cluster is powerful enough
for processing the transmitted documents. You can add 1000-10000 requests
in a single bulk request.

If you still exhaust the bulk queue size, you should examine your
code if you really examine the bulk responses of each bulk request, and if
your bulk client limits the concurrency of bulk requests. Look at the
BulkProcessor source code to learn about concurrent bulk requests.

Jörg

On Tue, Mar 4, 2014 at 4:45 PM, jgargallo jgargallo@gmail.comwrote:

Hello,

I'm facing a problem bulk indexing 5k documents in 24 different
indices (i18n). I'm using elasticsearch 1.0.1 with all default settings.
I've read that a thread per index is used, that would mean I'm using 24
bulk threads at one time. Am I right? if so, why I'm getting this rejection
since queue capacity is 50? It's possible that replicas consume threads as
well? or a thread per shard is used? this is my _cluster/health:

{
"cluster_name": "elasticsearch",
"status": "yellow",
"timed_out": false,
"number_of_nodes": 1,
"number_of_data_nodes": 1,
"active_primary_shards": 120,
"active_shards": 120,
"relocating_shards": 0,
"initializing_shards": 0,
"unassigned_shards": 120
}

120 shards = 24 indices * 5 shard

I've tried splitting the bulk indexing in different chunk sizes with
same result (as far as i understand this is not the solution).

Am I doing something wrong or I just have to increase the queue size?

I will appreciate any help

thanks

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearch+unsubscribe@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/88614f79-e1df-420a-b471-fb0eedb9baa9%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in
the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/AemPEOkDvEE/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoH2aRoc58Luz1Y4o-v73q1-v6UwV0NWnEq8YY_Oj4CtrA%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAPKnLaOW1UAk7SiLfO%3D912dSi56ahYazzNjEPGMAKyeR%2BLkzwQ%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/AemPEOkDvEE/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEdv4usqhoSe32w-sqz3ZBSn2LwKTH2BqanUU4t70nE%3Dw%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAPKnLaOMyPrV5Uf0KZ98PVARGQFdc%2B_THGMZbbQygtVjNXv1zA%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/AemPEOkDvEE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFvXLcJp6%2Bz3RnLmTwQQa%3DX1Msaigcyw9E2zs8JBK_pCQ%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAKdsXoFvXLcJp6%2Bz3RnLmTwQQa%3DX1Msaigcyw9E2zs8JBK_pCQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAPKnLaOm5UfqeYdRA54ObktgJ0_ZDRGmDiaT8uZmi311LqTRoA%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGNPf6-7Ckxq_6RDfnRZ0WTT20yP%3DK0qdC6U%2B62m-_hFQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jose Gargallo) #9

Ok, I'm gonna play with that, but it seems too complicated just for bulk
indexing taking into account the low number of documents

Thanks
El 04/03/2014 19:53, "joergprante@gmail.com" joergprante@gmail.com
escribió:

Logging is not enough, you should care for the number of active requests
sent and the bulk request responses that came back. So you can control the
number of concurrent bulk requests that are active at a time, and if you do
so, you can limit this number, before exceeding the bulk queue size of 50.

Jörg

On Tue, Mar 4, 2014 at 7:19 PM, Jose Gargallo jgargallo@gmail.com wrote:

I'm using python, I just sent the log so you could figure out what I'm
doing.

I'm not evaluating the bulk response but logging them so I can see the
'rejected execution' error in most of them.

On 4 March 2014 19:13, joergprante@gmail.com joergprante@gmail.comwrote:

So you use plain HTTP API? Do you evaluate the responses from POST
/_bulk requests before sending next request?

Jörg

On Tue, Mar 4, 2014 at 6:59 PM, Jose Gargallo jgargallo@gmail.comwrote:

This is what I'm doing:

for each 24 locales:
POST /current_contenidos_[LOCALE] # creates index
PUT /current_contenidos_es_es/_settings # sets refresh_interval = -1

POST /_bulk HTTP/1.1" 200 2386284 (12 times [mixed requests for 24
indices])

for each 24 locales:
PUT /contenidos_es_es/_settings # sets back refresh_interval = 1s

On 4 March 2014 18:43, joergprante@gmail.com joergprante@gmail.comwrote:

Without being able to look at source code, it is difficult if not
impossible to find issues.

"index.refresh_interval: -1" must be set to the respective index,
preferably using the cluster update API (conf file or index creation
settings is also possible but not a good place for temporary settings). It
does not "speed up" bulk indexing, it saves some resources, the node does
not refresh that often for index reads, which is often a heavy operation.
With only 120k docs, distributed over 24 indices, the effect may not be
visible at all. Default flush buffer size for an index is 64k IIRC.

Jörg

On Tue, Mar 4, 2014 at 6:12 PM, Jose Gargallo jgargallo@gmail.comwrote:

I understand, but I must be missing something.

5k documents * 24 indices = 120k requests, How am i supposed to bulk
index them? I've tried to set "index.refresh_interval" to "-1" to speed up
the process but still same result. splitting the bulk in different sizes
didn't work either.

On 4 March 2014 17:55, joergprante@gmail.com joergprante@gmail.comwrote:

Threads are pooled, they are not used per index.

The queue length of 50 works in almost any case. 50 is also safe to
protect a node before being overwhelmed by too many documents. If not,
think about the bulk request size, and if your cluster is powerful enough
for processing the transmitted documents. You can add 1000-10000 requests
in a single bulk request.

If you still exhaust the bulk queue size, you should examine your
code if you really examine the bulk responses of each bulk request, and if
your bulk client limits the concurrency of bulk requests. Look at the
BulkProcessor source code to learn about concurrent bulk requests.

Jörg

On Tue, Mar 4, 2014 at 4:45 PM, jgargallo jgargallo@gmail.comwrote:

Hello,

I'm facing a problem bulk indexing 5k documents in 24 different
indices (i18n). I'm using elasticsearch 1.0.1 with all default settings.
I've read that a thread per index is used, that would mean I'm using 24
bulk threads at one time. Am I right? if so, why I'm getting this rejection
since queue capacity is 50? It's possible that replicas consume threads as
well? or a thread per shard is used? this is my _cluster/health:

{
"cluster_name": "elasticsearch",
"status": "yellow",
"timed_out": false,
"number_of_nodes": 1,
"number_of_data_nodes": 1,
"active_primary_shards": 120,
"active_shards": 120,
"relocating_shards": 0,
"initializing_shards": 0,
"unassigned_shards": 120
}

120 shards = 24 indices * 5 shard

I've tried splitting the bulk indexing in different chunk sizes
with same result (as far as i understand this is not the solution).

Am I doing something wrong or I just have to increase the queue
size?

I will appreciate any help

thanks

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearch+unsubscribe@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/88614f79-e1df-420a-b471-fb0eedb9baa9%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in
the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/AemPEOkDvEE/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoH2aRoc58Luz1Y4o-v73q1-v6UwV0NWnEq8YY_Oj4CtrA%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAPKnLaOW1UAk7SiLfO%3D912dSi56ahYazzNjEPGMAKyeR%2BLkzwQ%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/AemPEOkDvEE/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEdv4usqhoSe32w-sqz3ZBSn2LwKTH2BqanUU4t70nE%3Dw%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAPKnLaOMyPrV5Uf0KZ98PVARGQFdc%2B_THGMZbbQygtVjNXv1zA%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/AemPEOkDvEE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFvXLcJp6%2Bz3RnLmTwQQa%3DX1Msaigcyw9E2zs8JBK_pCQ%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAKdsXoFvXLcJp6%2Bz3RnLmTwQQa%3DX1Msaigcyw9E2zs8JBK_pCQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAPKnLaOm5UfqeYdRA54ObktgJ0_ZDRGmDiaT8uZmi311LqTRoA%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/AemPEOkDvEE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGNPf6-7Ckxq_6RDfnRZ0WTT20yP%3DK0qdC6U%2B62m-_hFQ%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAKdsXoGNPf6-7Ckxq_6RDfnRZ0WTT20yP%3DK0qdC6U%2B62m-_hFQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPKnLaNqVtXx%3DFuw2-OmUmgM3MGCK2fjtcvoMhBmVcBEB46gxw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jörg Prante) #10

Probably you should check why your cluster is yellow - is that a single
node only?

Bulk indexing with a green cluster should work flawlessly.

Jörg

On Tue, Mar 4, 2014 at 8:02 PM, Jose Gargallo jgargallo@gmail.com wrote:

Ok, I'm gonna play with that, but it seems too complicated just for bulk
indexing taking into account the low number of documents

Thanks
El 04/03/2014 19:53, "joergprante@gmail.com" joergprante@gmail.com
escribió:

Logging is not enough, you should care for the number of active requests
sent and the bulk request responses that came back. So you can control the
number of concurrent bulk requests that are active at a time, and if you do
so, you can limit this number, before exceeding the bulk queue size of 50.

Jörg

On Tue, Mar 4, 2014 at 7:19 PM, Jose Gargallo jgargallo@gmail.comwrote:

I'm using python, I just sent the log so you could figure out what I'm
doing.

I'm not evaluating the bulk response but logging them so I can see the
'rejected execution' error in most of them.

On 4 March 2014 19:13, joergprante@gmail.com joergprante@gmail.comwrote:

So you use plain HTTP API? Do you evaluate the responses from POST
/_bulk requests before sending next request?

Jörg

On Tue, Mar 4, 2014 at 6:59 PM, Jose Gargallo jgargallo@gmail.comwrote:

This is what I'm doing:

for each 24 locales:
POST /current_contenidos_[LOCALE] # creates index
PUT /current_contenidos_es_es/_settings # sets refresh_interval =
-1

POST /_bulk HTTP/1.1" 200 2386284 (12 times [mixed requests for 24
indices])

for each 24 locales:
PUT /contenidos_es_es/_settings # sets back refresh_interval = 1s

On 4 March 2014 18:43, joergprante@gmail.com joergprante@gmail.comwrote:

Without being able to look at source code, it is difficult if not
impossible to find issues.

"index.refresh_interval: -1" must be set to the respective index,
preferably using the cluster update API (conf file or index creation
settings is also possible but not a good place for temporary settings). It
does not "speed up" bulk indexing, it saves some resources, the node does
not refresh that often for index reads, which is often a heavy operation.
With only 120k docs, distributed over 24 indices, the effect may not be
visible at all. Default flush buffer size for an index is 64k IIRC.

Jörg

On Tue, Mar 4, 2014 at 6:12 PM, Jose Gargallo jgargallo@gmail.comwrote:

I understand, but I must be missing something.

5k documents * 24 indices = 120k requests, How am i supposed to
bulk index them? I've tried to set "index.refresh_interval" to "-1" to
speed up the process but still same result. splitting the bulk in different
sizes didn't work either.

On 4 March 2014 17:55, joergprante@gmail.com joergprante@gmail.comwrote:

Threads are pooled, they are not used per index.

The queue length of 50 works in almost any case. 50 is also safe to
protect a node before being overwhelmed by too many documents. If not,
think about the bulk request size, and if your cluster is powerful enough
for processing the transmitted documents. You can add 1000-10000 requests
in a single bulk request.

If you still exhaust the bulk queue size, you should examine your
code if you really examine the bulk responses of each bulk request, and if
your bulk client limits the concurrency of bulk requests. Look at the
BulkProcessor source code to learn about concurrent bulk requests.

Jörg

On Tue, Mar 4, 2014 at 4:45 PM, jgargallo jgargallo@gmail.comwrote:

Hello,

I'm facing a problem bulk indexing 5k documents in 24 different
indices (i18n). I'm using elasticsearch 1.0.1 with all default settings.
I've read that a thread per index is used, that would mean I'm using 24
bulk threads at one time. Am I right? if so, why I'm getting this rejection
since queue capacity is 50? It's possible that replicas consume threads as
well? or a thread per shard is used? this is my _cluster/health:

{
"cluster_name": "elasticsearch",
"status": "yellow",
"timed_out": false,
"number_of_nodes": 1,
"number_of_data_nodes": 1,
"active_primary_shards": 120,
"active_shards": 120,
"relocating_shards": 0,
"initializing_shards": 0,
"unassigned_shards": 120
}

120 shards = 24 indices * 5 shard

I've tried splitting the bulk indexing in different chunk sizes
with same result (as far as i understand this is not the solution).

Am I doing something wrong or I just have to increase the queue
size?

I will appreciate any help

thanks

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearch+unsubscribe@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/88614f79-e1df-420a-b471-fb0eedb9baa9%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in
the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/AemPEOkDvEE/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoH2aRoc58Luz1Y4o-v73q1-v6UwV0NWnEq8YY_Oj4CtrA%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAPKnLaOW1UAk7SiLfO%3D912dSi56ahYazzNjEPGMAKyeR%2BLkzwQ%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in
the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/AemPEOkDvEE/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEdv4usqhoSe32w-sqz3ZBSn2LwKTH2BqanUU4t70nE%3Dw%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAPKnLaOMyPrV5Uf0KZ98PVARGQFdc%2B_THGMZbbQygtVjNXv1zA%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/AemPEOkDvEE/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFvXLcJp6%2Bz3RnLmTwQQa%3DX1Msaigcyw9E2zs8JBK_pCQ%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAKdsXoFvXLcJp6%2Bz3RnLmTwQQa%3DX1Msaigcyw9E2zs8JBK_pCQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAPKnLaOm5UfqeYdRA54ObktgJ0_ZDRGmDiaT8uZmi311LqTRoA%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/AemPEOkDvEE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGNPf6-7Ckxq_6RDfnRZ0WTT20yP%3DK0qdC6U%2B62m-_hFQ%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAKdsXoGNPf6-7Ckxq_6RDfnRZ0WTT20yP%3DK0qdC6U%2B62m-_hFQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAPKnLaNqVtXx%3DFuw2-OmUmgM3MGCK2fjtcvoMhBmVcBEB46gxw%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFS%3D9J6DuBfF53QG3Ai420qoFZa_kSnpnBrOZAgJC_zcA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jose Gargallo) #11

Going green worked for 1 shard and 0 replicas, but dind't work for 5 shards
and 0 replicas being green.

Thanks

On 4 March 2014 21:33, joergprante@gmail.com joergprante@gmail.com wrote:

Probably you should check why your cluster is yellow - is that a single
node only?

Bulk indexing with a green cluster should work flawlessly.

Jörg

On Tue, Mar 4, 2014 at 8:02 PM, Jose Gargallo jgargallo@gmail.com wrote:

Ok, I'm gonna play with that, but it seems too complicated just for bulk
indexing taking into account the low number of documents

Thanks
El 04/03/2014 19:53, "joergprante@gmail.com" joergprante@gmail.com
escribió:

Logging is not enough, you should care for the number of active requests
sent and the bulk request responses that came back. So you can control the
number of concurrent bulk requests that are active at a time, and if you do
so, you can limit this number, before exceeding the bulk queue size of 50.

Jörg

On Tue, Mar 4, 2014 at 7:19 PM, Jose Gargallo jgargallo@gmail.comwrote:

I'm using python, I just sent the log so you could figure out what I'm
doing.

I'm not evaluating the bulk response but logging them so I can see the
'rejected execution' error in most of them.

On 4 March 2014 19:13, joergprante@gmail.com joergprante@gmail.comwrote:

So you use plain HTTP API? Do you evaluate the responses from POST
/_bulk requests before sending next request?

Jörg

On Tue, Mar 4, 2014 at 6:59 PM, Jose Gargallo jgargallo@gmail.comwrote:

This is what I'm doing:

for each 24 locales:
POST /current_contenidos_[LOCALE] # creates index
PUT /current_contenidos_es_es/_settings # sets refresh_interval =
-1

POST /_bulk HTTP/1.1" 200 2386284 (12 times [mixed requests for 24
indices])

for each 24 locales:
PUT /contenidos_es_es/_settings # sets back refresh_interval = 1s

On 4 March 2014 18:43, joergprante@gmail.com joergprante@gmail.comwrote:

Without being able to look at source code, it is difficult if not
impossible to find issues.

"index.refresh_interval: -1" must be set to the respective index,
preferably using the cluster update API (conf file or index creation
settings is also possible but not a good place for temporary settings). It
does not "speed up" bulk indexing, it saves some resources, the node does
not refresh that often for index reads, which is often a heavy operation.
With only 120k docs, distributed over 24 indices, the effect may not be
visible at all. Default flush buffer size for an index is 64k IIRC.

Jörg

On Tue, Mar 4, 2014 at 6:12 PM, Jose Gargallo jgargallo@gmail.comwrote:

I understand, but I must be missing something.

5k documents * 24 indices = 120k requests, How am i supposed to
bulk index them? I've tried to set "index.refresh_interval" to "-1" to
speed up the process but still same result. splitting the bulk in different
sizes didn't work either.

On 4 March 2014 17:55, joergprante@gmail.com <joergprante@gmail.com

wrote:

Threads are pooled, they are not used per index.

The queue length of 50 works in almost any case. 50 is also safe
to protect a node before being overwhelmed by too many documents. If not,
think about the bulk request size, and if your cluster is powerful enough
for processing the transmitted documents. You can add 1000-10000 requests
in a single bulk request.

If you still exhaust the bulk queue size, you should examine your
code if you really examine the bulk responses of each bulk request, and if
your bulk client limits the concurrency of bulk requests. Look at the
BulkProcessor source code to learn about concurrent bulk requests.

Jörg

On Tue, Mar 4, 2014 at 4:45 PM, jgargallo jgargallo@gmail.comwrote:

Hello,

I'm facing a problem bulk indexing 5k documents in 24 different
indices (i18n). I'm using elasticsearch 1.0.1 with all default settings.
I've read that a thread per index is used, that would mean I'm using 24
bulk threads at one time. Am I right? if so, why I'm getting this rejection
since queue capacity is 50? It's possible that replicas consume threads as
well? or a thread per shard is used? this is my _cluster/health:

{
"cluster_name": "elasticsearch",
"status": "yellow",
"timed_out": false,
"number_of_nodes": 1,
"number_of_data_nodes": 1,
"active_primary_shards": 120,
"active_shards": 120,
"relocating_shards": 0,
"initializing_shards": 0,
"unassigned_shards": 120
}

120 shards = 24 indices * 5 shard

I've tried splitting the bulk indexing in different chunk sizes
with same result (as far as i understand this is not the solution).

Am I doing something wrong or I just have to increase the queue
size?

I will appreciate any help

thanks

--
You received this message because you are subscribed to the
Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearch+unsubscribe@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/88614f79-e1df-420a-b471-fb0eedb9baa9%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in
the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/AemPEOkDvEE/unsubscribe
.
To unsubscribe from this group and all its topics, send an email
to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoH2aRoc58Luz1Y4o-v73q1-v6UwV0NWnEq8YY_Oj4CtrA%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAPKnLaOW1UAk7SiLfO%3D912dSi56ahYazzNjEPGMAKyeR%2BLkzwQ%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in
the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/AemPEOkDvEE/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEdv4usqhoSe32w-sqz3ZBSn2LwKTH2BqanUU4t70nE%3Dw%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAPKnLaOMyPrV5Uf0KZ98PVARGQFdc%2B_THGMZbbQygtVjNXv1zA%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/AemPEOkDvEE/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFvXLcJp6%2Bz3RnLmTwQQa%3DX1Msaigcyw9E2zs8JBK_pCQ%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAKdsXoFvXLcJp6%2Bz3RnLmTwQQa%3DX1Msaigcyw9E2zs8JBK_pCQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAPKnLaOm5UfqeYdRA54ObktgJ0_ZDRGmDiaT8uZmi311LqTRoA%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/AemPEOkDvEE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGNPf6-7Ckxq_6RDfnRZ0WTT20yP%3DK0qdC6U%2B62m-_hFQ%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAKdsXoGNPf6-7Ckxq_6RDfnRZ0WTT20yP%3DK0qdC6U%2B62m-_hFQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAPKnLaNqVtXx%3DFuw2-OmUmgM3MGCK2fjtcvoMhBmVcBEB46gxw%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/AemPEOkDvEE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFS%3D9J6DuBfF53QG3Ai420qoFZa_kSnpnBrOZAgJC_zcA%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAKdsXoFS%3D9J6DuBfF53QG3Ai420qoFZa_kSnpnBrOZAgJC_zcA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPKnLaO9WmyBrDdaUC-2rPmYqXWeLR5CQ%3DH7gULfh%3DHOgr1uEQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #12