ES 0.9 on EC2 - Processor load maximizes on 100% of 1 core on multi core processor

Hello,

We are currently in the process of moving from an ES 0.9 cluster to an ES
1.4 cluster. Both clusters are in Amazon Ec2.

Before doing so, we need to index a lot of indexes to the ES 0.9 cluster
first. The nodes in this cluster are all m3.2xlarge machines (8 cores, 30G
of memory). In general the nodes in this cluster are having an average
processor load of 3% (so no problems at all there). The nodes are newly
created from the image, so we can assume that they are clean.

The problem arises when we are going to do bulk requests. Whenever the
distribution of the threads on one node is around 1/8 of the total of the
processors, latency on the cluster goes up from 3,5ms to 100's of ms in
average.

When I do a *top *the threads are all divided over all the processors. All
processors can have 800% of load if you add it up, but whenever the
addition of percentages of all cores reaches 100%, it immediately starts
throttling (making other requests very slow).

Question

Does anybody have experience with this situation and if yes, is there a way
to easily fix this?

Example of what I see in top:

Cpu0 : 3.7%us, 0.3%sy, 0.0%ni, 96.0%id, 0.0%wa, 0.0%hi, 0.0%si,
0.0%st

Cpu1 : 1.0%us, 0.0%sy, 0.0%ni, 99.0%id, 0.0%wa, 0.0%hi, 0.0%si,
0.0%st

Cpu2 : 1.0%us, 0.3%sy, 0.0%ni, 98.7%id, 0.0%wa, 0.0%hi, 0.0%si,
0.0%st

Cpu3 : 0.7%us, 0.0%sy, 0.0%ni, 99.0%id, 0.3%wa, 0.0%hi, 0.0%si,
0.0%st

Cpu4 : 1.7%us, 0.0%sy, 0.0%ni, 98.3%id, 0.0%wa, 0.0%hi, 0.0%si,
0.0%st

Cpu5 : 1.0%us, 0.0%sy, 0.0%ni, 99.0%id, 0.0%wa, 0.0%hi, 0.0%si,
0.0%st

Cpu6 : 3.0%us, 0.3%sy, 0.0%ni, 96.7%id, 0.0%wa, 0.0%hi, 0.0%si,
0.0%st

Cpu7 : 1.0%us, 0.0%sy, 0.0%ni, 99.0%id, 0.0%wa, 0.0%hi, 0.0%si,
0.0%st

Mem: 30764132k total, 30613104k used, 151028k free, 129224k buffers

Swap: 0k total, 0k used, 0k free, 12410696k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

24584 elastics 20 0 19.5g 15g 119m S 14.6 53.2 4351:41 java

Other cases

This problem appears in exactly the same way on 4 core instances and 2 core
instances. A respective 1/4 and 1/2 total load of processors causes it to
have a really high latency

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a63eb62e-8437-477c-b379-c3fdf8a21a37%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hi Maik,
Have you tried changing bulk size? May also be worth seeing if separating
masters to their own nodes makes a difference...
On 20/02/2015 8:22 pm, "Maik Broxterman" broxterman@gmail.com wrote:

Hello,

We are currently in the process of moving from an ES 0.9 cluster to an ES
1.4 cluster. Both clusters are in Amazon Ec2.

Before doing so, we need to index a lot of indexes to the ES 0.9 cluster
first. The nodes in this cluster are all m3.2xlarge machines (8 cores, 30G
of memory). In general the nodes in this cluster are having an average
processor load of 3% (so no problems at all there). The nodes are newly
created from the image, so we can assume that they are clean.

The problem arises when we are going to do bulk requests. Whenever the
distribution of the threads on one node is around 1/8 of the total of the
processors, latency on the cluster goes up from 3,5ms to 100's of ms in
average.

When I do a *top *the threads are all divided over all the processors.
All processors can have 800% of load if you add it up, but whenever the
addition of percentages of all cores reaches 100%, it immediately starts
throttling (making other requests very slow).

Question

Does anybody have experience with this situation and if yes, is there a
way to easily fix this?

Example of what I see in top:

Cpu0 : 3.7%us, 0.3%sy, 0.0%ni, 96.0%id, 0.0%wa, 0.0%hi, 0.0%si,
0.0%st

Cpu1 : 1.0%us, 0.0%sy, 0.0%ni, 99.0%id, 0.0%wa, 0.0%hi, 0.0%si,
0.0%st

Cpu2 : 1.0%us, 0.3%sy, 0.0%ni, 98.7%id, 0.0%wa, 0.0%hi, 0.0%si,
0.0%st

Cpu3 : 0.7%us, 0.0%sy, 0.0%ni, 99.0%id, 0.3%wa, 0.0%hi, 0.0%si,
0.0%st

Cpu4 : 1.7%us, 0.0%sy, 0.0%ni, 98.3%id, 0.0%wa, 0.0%hi, 0.0%si,
0.0%st

Cpu5 : 1.0%us, 0.0%sy, 0.0%ni, 99.0%id, 0.0%wa, 0.0%hi, 0.0%si,
0.0%st

Cpu6 : 3.0%us, 0.3%sy, 0.0%ni, 96.7%id, 0.0%wa, 0.0%hi, 0.0%si,
0.0%st

Cpu7 : 1.0%us, 0.0%sy, 0.0%ni, 99.0%id, 0.0%wa, 0.0%hi, 0.0%si,
0.0%st

Mem: 30764132k total, 30613104k used, 151028k free, 129224k buffers

Swap: 0k total, 0k used, 0k free, 12410696k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

24584 elastics 20 0 19.5g 15g 119m S 14.6 53.2 4351:41 java

Other cases

This problem appears in exactly the same way on 4 core instances and 2
core instances. A respective 1/4 and 1/2 total load of processors causes it
to have a really high latency

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a63eb62e-8437-477c-b379-c3fdf8a21a37%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a63eb62e-8437-477c-b379-c3fdf8a21a37%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CACj2-4KkkaWLdgetUBcWfTeVqKV-zP5TykzGbYAJVZ6%3DQ%2BkJrw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Hi Norberto,

Thanks. Yes I've tried that, between 100 and 1000. It does not matter. The
strange thing is that if I do exactly the same in the 1.4 cluster that has
4 cores processors, it just blows fully to 395% (~4*100%).

On Friday, February 20, 2015 at 1:20:16 PM UTC+1, Norberto Meijome wrote:

Hi Maik,
Have you tried changing bulk size? May also be worth seeing if separating
masters to their own nodes makes a difference...
On 20/02/2015 8:22 pm, "Maik Broxterman" <broxt...@gmail.com <javascript:>>
wrote:

Hello,

We are currently in the process of moving from an ES 0.9 cluster to an ES
1.4 cluster. Both clusters are in Amazon Ec2.

Before doing so, we need to index a lot of indexes to the ES 0.9 cluster
first. The nodes in this cluster are all m3.2xlarge machines (8 cores, 30G
of memory). In general the nodes in this cluster are having an average
processor load of 3% (so no problems at all there). The nodes are newly
created from the image, so we can assume that they are clean.

The problem arises when we are going to do bulk requests. Whenever the
distribution of the threads on one node is around 1/8 of the total of the
processors, latency on the cluster goes up from 3,5ms to 100's of ms in
average.

When I do a *top *the threads are all divided over all the processors.
All processors can have 800% of load if you add it up, but whenever the
addition of percentages of all cores reaches 100%, it immediately starts
throttling (making other requests very slow).

Question

Does anybody have experience with this situation and if yes, is there a
way to easily fix this?

Example of what I see in top:

Cpu0 : 3.7%us, 0.3%sy, 0.0%ni, 96.0%id, 0.0%wa, 0.0%hi, 0.0%si,
0.0%st

Cpu1 : 1.0%us, 0.0%sy, 0.0%ni, 99.0%id, 0.0%wa, 0.0%hi, 0.0%si,
0.0%st

Cpu2 : 1.0%us, 0.3%sy, 0.0%ni, 98.7%id, 0.0%wa, 0.0%hi, 0.0%si,
0.0%st

Cpu3 : 0.7%us, 0.0%sy, 0.0%ni, 99.0%id, 0.3%wa, 0.0%hi, 0.0%si,
0.0%st

Cpu4 : 1.7%us, 0.0%sy, 0.0%ni, 98.3%id, 0.0%wa, 0.0%hi, 0.0%si,
0.0%st

Cpu5 : 1.0%us, 0.0%sy, 0.0%ni, 99.0%id, 0.0%wa, 0.0%hi, 0.0%si,
0.0%st

Cpu6 : 3.0%us, 0.3%sy, 0.0%ni, 96.7%id, 0.0%wa, 0.0%hi, 0.0%si,
0.0%st

Cpu7 : 1.0%us, 0.0%sy, 0.0%ni, 99.0%id, 0.0%wa, 0.0%hi, 0.0%si,
0.0%st

Mem: 30764132k total, 30613104k used, 151028k free, 129224k buffers

Swap: 0k total, 0k used, 0k free, 12410696k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

24584 elastics 20 0 19.5g 15g 119m S 14.6 53.2 4351:41 java

Other cases

This problem appears in exactly the same way on 4 core instances and 2
core instances. A respective 1/4 and 1/2 total load of processors causes it
to have a really high latency

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a63eb62e-8437-477c-b379-c3fdf8a21a37%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a63eb62e-8437-477c-b379-c3fdf8a21a37%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2c011402-2b60-4737-a12d-d14476a410b6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

I'd try more that 1K, more like 2-3K, see if that helps.

On 21 February 2015 at 04:49, Maik Broxterman broxterman@gmail.com wrote:

Hi Norberto,

Thanks. Yes I've tried that, between 100 and 1000. It does not matter. The
strange thing is that if I do exactly the same in the 1.4 cluster that has
4 cores processors, it just blows fully to 395% (~4*100%).

On Friday, February 20, 2015 at 1:20:16 PM UTC+1, Norberto Meijome wrote:

Hi Maik,
Have you tried changing bulk size? May also be worth seeing if separating
masters to their own nodes makes a difference...
On 20/02/2015 8:22 pm, "Maik Broxterman" broxt...@gmail.com wrote:

Hello,

We are currently in the process of moving from an ES 0.9 cluster to an
ES 1.4 cluster. Both clusters are in Amazon Ec2.

Before doing so, we need to index a lot of indexes to the ES 0.9 cluster
first. The nodes in this cluster are all m3.2xlarge machines (8 cores, 30G
of memory). In general the nodes in this cluster are having an average
processor load of 3% (so no problems at all there). The nodes are newly
created from the image, so we can assume that they are clean.

The problem arises when we are going to do bulk requests. Whenever the
distribution of the threads on one node is around 1/8 of the total of the
processors, latency on the cluster goes up from 3,5ms to 100's of ms in
average.

When I do a *top *the threads are all divided over all the processors.
All processors can have 800% of load if you add it up, but whenever the
addition of percentages of all cores reaches 100%, it immediately starts
throttling (making other requests very slow).

Question

Does anybody have experience with this situation and if yes, is there a
way to easily fix this?

Example of what I see in top:

Cpu0 : 3.7%us, 0.3%sy, 0.0%ni, 96.0%id, 0.0%wa, 0.0%hi, 0.0%si,
0.0%st

Cpu1 : 1.0%us, 0.0%sy, 0.0%ni, 99.0%id, 0.0%wa, 0.0%hi, 0.0%si,
0.0%st

Cpu2 : 1.0%us, 0.3%sy, 0.0%ni, 98.7%id, 0.0%wa, 0.0%hi, 0.0%si,
0.0%st

Cpu3 : 0.7%us, 0.0%sy, 0.0%ni, 99.0%id, 0.3%wa, 0.0%hi, 0.0%si,
0.0%st

Cpu4 : 1.7%us, 0.0%sy, 0.0%ni, 98.3%id, 0.0%wa, 0.0%hi, 0.0%si,
0.0%st

Cpu5 : 1.0%us, 0.0%sy, 0.0%ni, 99.0%id, 0.0%wa, 0.0%hi, 0.0%si,
0.0%st

Cpu6 : 3.0%us, 0.3%sy, 0.0%ni, 96.7%id, 0.0%wa, 0.0%hi, 0.0%si,
0.0%st

Cpu7 : 1.0%us, 0.0%sy, 0.0%ni, 99.0%id, 0.0%wa, 0.0%hi, 0.0%si,
0.0%st

Mem: 30764132k total, 30613104k used, 151028k free, 129224k buffers

Swap: 0k total, 0k used, 0k free, 12410696k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

24584 elastics 20 0 19.5g 15g 119m S 14.6 53.2 4351:41 java

Other cases

This problem appears in exactly the same way on 4 core instances and 2
core instances. A respective 1/4 and 1/2 total load of processors causes it
to have a really high latency

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/a63eb62e-8437-477c-b379-c3fdf8a21a37%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a63eb62e-8437-477c-b379-c3fdf8a21a37%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/2c011402-2b60-4737-a12d-d14476a410b6%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/2c011402-2b60-4737-a12d-d14476a410b6%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X9wO1HiEPUrAE_jHMszyKvcYzVaBKqO9zObLFPszu86wA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Ok, I seem to have found a trick.

  • Step 1: remove the Route53 DNS record for the cluster for 1 node, so it
    has no incoming client traffic
  • Step 2: remove all replica's for the index
  • Step 3: use the reroute api to move the shard to the node without traffic
  • Step 4: run the bulk requests again

Thanks both for your time and effort!

Gr, Maik

On Friday, February 20, 2015 at 9:59:30 PM UTC+1, Mark Walkom wrote:

I'd try more that 1K, more like 2-3K, see if that helps.

On 21 February 2015 at 04:49, Maik Broxterman <broxt...@gmail.com
<javascript:>> wrote:

Hi Norberto,

Thanks. Yes I've tried that, between 100 and 1000. It does not matter.
The strange thing is that if I do exactly the same in the 1.4 cluster that
has 4 cores processors, it just blows fully to 395% (~4*100%).

On Friday, February 20, 2015 at 1:20:16 PM UTC+1, Norberto Meijome wrote:

Hi Maik,
Have you tried changing bulk size? May also be worth seeing if
separating masters to their own nodes makes a difference...
On 20/02/2015 8:22 pm, "Maik Broxterman" broxt...@gmail.com wrote:

Hello,

We are currently in the process of moving from an ES 0.9 cluster to an
ES 1.4 cluster. Both clusters are in Amazon Ec2.

Before doing so, we need to index a lot of indexes to the ES 0.9
cluster first. The nodes in this cluster are all m3.2xlarge machines (8
cores, 30G of memory). In general the nodes in this cluster are having an
average processor load of 3% (so no problems at all there). The nodes are
newly created from the image, so we can assume that they are clean.

The problem arises when we are going to do bulk requests. Whenever the
distribution of the threads on one node is around 1/8 of the total of the
processors, latency on the cluster goes up from 3,5ms to 100's of ms in
average.

When I do a *top *the threads are all divided over all the processors.
All processors can have 800% of load if you add it up, but whenever the
addition of percentages of all cores reaches 100%, it immediately starts
throttling (making other requests very slow).

Question

Does anybody have experience with this situation and if yes, is there a
way to easily fix this?

Example of what I see in top:

Cpu0 : 3.7%us, 0.3%sy, 0.0%ni, 96.0%id, 0.0%wa, 0.0%hi, 0.0%si,
0.0%st

Cpu1 : 1.0%us, 0.0%sy, 0.0%ni, 99.0%id, 0.0%wa, 0.0%hi, 0.0%si,
0.0%st

Cpu2 : 1.0%us, 0.3%sy, 0.0%ni, 98.7%id, 0.0%wa, 0.0%hi, 0.0%si,
0.0%st

Cpu3 : 0.7%us, 0.0%sy, 0.0%ni, 99.0%id, 0.3%wa, 0.0%hi, 0.0%si,
0.0%st

Cpu4 : 1.7%us, 0.0%sy, 0.0%ni, 98.3%id, 0.0%wa, 0.0%hi, 0.0%si,
0.0%st

Cpu5 : 1.0%us, 0.0%sy, 0.0%ni, 99.0%id, 0.0%wa, 0.0%hi, 0.0%si,
0.0%st

Cpu6 : 3.0%us, 0.3%sy, 0.0%ni, 96.7%id, 0.0%wa, 0.0%hi, 0.0%si,
0.0%st

Cpu7 : 1.0%us, 0.0%sy, 0.0%ni, 99.0%id, 0.0%wa, 0.0%hi, 0.0%si,
0.0%st

Mem: 30764132k total, 30613104k used, 151028k free, 129224k buffers

Swap: 0k total, 0k used, 0k free, 12410696k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

24584 elastics 20 0 19.5g 15g 119m S 14.6 53.2 4351:41 java

Other cases

This problem appears in exactly the same way on 4 core instances and 2
core instances. A respective 1/4 and 1/2 total load of processors causes it
to have a really high latency

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/a63eb62e-8437-477c-b379-c3fdf8a21a37%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a63eb62e-8437-477c-b379-c3fdf8a21a37%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/2c011402-2b60-4737-a12d-d14476a410b6%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/2c011402-2b60-4737-a12d-d14476a410b6%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c1e8c474-e130-4157-848d-422df3868ecd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

OK, so what you have is resource contention between searches and
indexing...
On 22/02/2015 12:44 am, "Maik Broxterman" broxterman@gmail.com wrote:

Ok, I seem to have found a trick.

  • Step 1: remove the Route53 DNS record for the cluster for 1 node, so it
    has no incoming client traffic
  • Step 2: remove all replica's for the index
  • Step 3: use the reroute api to move the shard to the node without traffic
  • Step 4: run the bulk requests again

Thanks both for your time and effort!

Gr, Maik

On Friday, February 20, 2015 at 9:59:30 PM UTC+1, Mark Walkom wrote:

I'd try more that 1K, more like 2-3K, see if that helps.

On 21 February 2015 at 04:49, Maik Broxterman broxt...@gmail.com wrote:

Hi Norberto,

Thanks. Yes I've tried that, between 100 and 1000. It does not matter.
The strange thing is that if I do exactly the same in the 1.4 cluster that
has 4 cores processors, it just blows fully to 395% (~4*100%).

On Friday, February 20, 2015 at 1:20:16 PM UTC+1, Norberto Meijome wrote:

Hi Maik,
Have you tried changing bulk size? May also be worth seeing if
separating masters to their own nodes makes a difference...
On 20/02/2015 8:22 pm, "Maik Broxterman" broxt...@gmail.com wrote:

Hello,

We are currently in the process of moving from an ES 0.9 cluster to an
ES 1.4 cluster. Both clusters are in Amazon Ec2.

Before doing so, we need to index a lot of indexes to the ES 0.9
cluster first. The nodes in this cluster are all m3.2xlarge machines (8
cores, 30G of memory). In general the nodes in this cluster are having an
average processor load of 3% (so no problems at all there). The nodes are
newly created from the image, so we can assume that they are clean.

The problem arises when we are going to do bulk requests. Whenever the
distribution of the threads on one node is around 1/8 of the total of the
processors, latency on the cluster goes up from 3,5ms to 100's of ms in
average.

When I do a *top *the threads are all divided over all the
processors. All processors can have 800% of load if you add it up, but
whenever the addition of percentages of all cores reaches 100%, it
immediately starts throttling (making other requests very slow).

Question

Does anybody have experience with this situation and if yes, is there
a way to easily fix this?

Example of what I see in top:

Cpu0 : 3.7%us, 0.3%sy, 0.0%ni, 96.0%id, 0.0%wa, 0.0%hi,
0.0%si, 0.0%st

Cpu1 : 1.0%us, 0.0%sy, 0.0%ni, 99.0%id, 0.0%wa, 0.0%hi,
0.0%si, 0.0%st

Cpu2 : 1.0%us, 0.3%sy, 0.0%ni, 98.7%id, 0.0%wa, 0.0%hi,
0.0%si, 0.0%st

Cpu3 : 0.7%us, 0.0%sy, 0.0%ni, 99.0%id, 0.3%wa, 0.0%hi,
0.0%si, 0.0%st

Cpu4 : 1.7%us, 0.0%sy, 0.0%ni, 98.3%id, 0.0%wa, 0.0%hi,
0.0%si, 0.0%st

Cpu5 : 1.0%us, 0.0%sy, 0.0%ni, 99.0%id, 0.0%wa, 0.0%hi,
0.0%si, 0.0%st

Cpu6 : 3.0%us, 0.3%sy, 0.0%ni, 96.7%id, 0.0%wa, 0.0%hi,
0.0%si, 0.0%st

Cpu7 : 1.0%us, 0.0%sy, 0.0%ni, 99.0%id, 0.0%wa, 0.0%hi,
0.0%si, 0.0%st

Mem: 30764132k total, 30613104k used, 151028k free, 129224k
buffers

Swap: 0k total, 0k used, 0k free, 12410696k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

24584 elastics 20 0 19.5g 15g 119m S 14.6 53.2 4351:41 java

Other cases

This problem appears in exactly the same way on 4 core instances and 2
core instances. A respective 1/4 and 1/2 total load of processors causes it
to have a really high latency

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/a63eb62e-8437-477c-b379-c3fdf8a21a37%40goo
glegroups.com
https://groups.google.com/d/msgid/elasticsearch/a63eb62e-8437-477c-b379-c3fdf8a21a37%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/2c011402-2b60-4737-a12d-d14476a410b6%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/2c011402-2b60-4737-a12d-d14476a410b6%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/c1e8c474-e130-4157-848d-422df3868ecd%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/c1e8c474-e130-4157-848d-422df3868ecd%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CACj2-4%2B_soF6i%3Dm%3DLa1J8UEh5cenLOCZ_nYJcBkHtXPamNsT9Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

BTW, are you reducing / disabling the refresh rate while bulk indexing ?
On 22/02/2015 10:08 am, "Norberto Meijome" numard@gmail.com wrote:

OK, so what you have is resource contention between searches and
indexing...
On 22/02/2015 12:44 am, "Maik Broxterman" broxterman@gmail.com wrote:

Ok, I seem to have found a trick.

  • Step 1: remove the Route53 DNS record for the cluster for 1 node, so it
    has no incoming client traffic
  • Step 2: remove all replica's for the index
  • Step 3: use the reroute api to move the shard to the node without
    traffic
  • Step 4: run the bulk requests again

Thanks both for your time and effort!

Gr, Maik

On Friday, February 20, 2015 at 9:59:30 PM UTC+1, Mark Walkom wrote:

I'd try more that 1K, more like 2-3K, see if that helps.

On 21 February 2015 at 04:49, Maik Broxterman broxt...@gmail.com
wrote:

Hi Norberto,

Thanks. Yes I've tried that, between 100 and 1000. It does not matter.
The strange thing is that if I do exactly the same in the 1.4 cluster that
has 4 cores processors, it just blows fully to 395% (~4*100%).

On Friday, February 20, 2015 at 1:20:16 PM UTC+1, Norberto Meijome
wrote:

Hi Maik,
Have you tried changing bulk size? May also be worth seeing if
separating masters to their own nodes makes a difference...
On 20/02/2015 8:22 pm, "Maik Broxterman" broxt...@gmail.com wrote:

Hello,

We are currently in the process of moving from an ES 0.9 cluster to
an ES 1.4 cluster. Both clusters are in Amazon Ec2.

Before doing so, we need to index a lot of indexes to the ES 0.9
cluster first. The nodes in this cluster are all m3.2xlarge machines (8
cores, 30G of memory). In general the nodes in this cluster are having an
average processor load of 3% (so no problems at all there). The nodes are
newly created from the image, so we can assume that they are clean.

The problem arises when we are going to do bulk requests. Whenever
the distribution of the threads on one node is around 1/8 of the total of
the processors, latency on the cluster goes up from 3,5ms to 100's of ms in
average.

When I do a *top *the threads are all divided over all the
processors. All processors can have 800% of load if you add it up, but
whenever the addition of percentages of all cores reaches 100%, it
immediately starts throttling (making other requests very slow).

Question

Does anybody have experience with this situation and if yes, is there
a way to easily fix this?

Example of what I see in top:

Cpu0 : 3.7%us, 0.3%sy, 0.0%ni, 96.0%id, 0.0%wa, 0.0%hi,
0.0%si, 0.0%st

Cpu1 : 1.0%us, 0.0%sy, 0.0%ni, 99.0%id, 0.0%wa, 0.0%hi,
0.0%si, 0.0%st

Cpu2 : 1.0%us, 0.3%sy, 0.0%ni, 98.7%id, 0.0%wa, 0.0%hi,
0.0%si, 0.0%st

Cpu3 : 0.7%us, 0.0%sy, 0.0%ni, 99.0%id, 0.3%wa, 0.0%hi,
0.0%si, 0.0%st

Cpu4 : 1.7%us, 0.0%sy, 0.0%ni, 98.3%id, 0.0%wa, 0.0%hi,
0.0%si, 0.0%st

Cpu5 : 1.0%us, 0.0%sy, 0.0%ni, 99.0%id, 0.0%wa, 0.0%hi,
0.0%si, 0.0%st

Cpu6 : 3.0%us, 0.3%sy, 0.0%ni, 96.7%id, 0.0%wa, 0.0%hi,
0.0%si, 0.0%st

Cpu7 : 1.0%us, 0.0%sy, 0.0%ni, 99.0%id, 0.0%wa, 0.0%hi,
0.0%si, 0.0%st

Mem: 30764132k total, 30613104k used, 151028k free, 129224k
buffers

Swap: 0k total, 0k used, 0k free, 12410696k
cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

24584 elastics 20 0 19.5g 15g 119m S 14.6 53.2 4351:41 java

Other cases

This problem appears in exactly the same way on 4 core instances and
2 core instances. A respective 1/4 and 1/2 total load of processors causes
it to have a really high latency

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/a63eb62e-8437-477c-b379-c3fdf8a21a37%40goo
glegroups.com
https://groups.google.com/d/msgid/elasticsearch/a63eb62e-8437-477c-b379-c3fdf8a21a37%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/2c011402-2b60-4737-a12d-d14476a410b6%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/2c011402-2b60-4737-a12d-d14476a410b6%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/c1e8c474-e130-4157-848d-422df3868ecd%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/c1e8c474-e130-4157-848d-422df3868ecd%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CACj2-4JGCNZJ%3DjmGeW49PpGQLkRqZi4Ht8nPJ%2BFkBqMX1BLAeA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.