10% CPU load without querying


(maho) #1

Hi,

i have 1 node with 1000 indices. Each index has 100k documents and 15
fields.
If I start elasticsearch I have a very high cpu load of 100-200%
(4x2,7ghz) for 5 minutes.

After startup the cpu load is about 5-10%. Is that ok?


(Clinton Gormley) #2

Hi Maho

On Thu, 2011-06-09 at 04:26 -0700, maho wrote:

Hi,

i have 1 node with 1000 indices. Each index has 100k documents and 15
fields.
If I start elasticsearch I have a very high cpu load of 100-200%
(4x2,7ghz) for 5 minutes.

After startup the cpu load is about 5-10%. Is that ok?

That sounds pretty normal, for a high number of indices.

Given that each index is small, I'd set each on to have only one primary
shard (number_of_shards) rather than the default 5.

This will improve start-uptime, performance and memory usage.

Also, why do you have so many indices? Could these not be combined?

You may be able to make use of the new alias filter functionality that
will be in 0.17:

clint


(maho) #3

Thanks for your answer.

I've already set the shards per index to 1.
Currently I'm evaluating solr and elasticsearch.
In Solr, the startup with 1000 indices takes 1 or 2 minutes and after
startup the cpu load is 0%.
So I'm a little bit confused why elasticsearch is so resource
intensive.

Why 100 indices? Because every customer has its own index. Using alias
filter could be a solution. But I think that would affect the scoring
of the documents because the ranking algorithm will be calculated on
the basis of all documents (=> inverse document frequency)?!

On 9 Jun., 14:06, Clinton Gormley clin...@iannounce.co.uk wrote:

Hi Maho

On Thu, 2011-06-09 at 04:26 -0700, maho wrote:

Hi,

i have 1 node with 1000 indices. Each index has 100k documents and 15
fields.
If I start elasticsearch I have a very high cpu load of 100-200%
(4x2,7ghz) for 5 minutes.

After startup the cpu load is about 5-10%. Is that ok?

That sounds pretty normal, for a high number of indices.

Given that each index is small, I'd set each on to have only one primary
shard (number_of_shards) rather than the default 5.

This will improve start-uptime, performance and memory usage.

Also, why do you have so many indices? Could these not be combined?

You may be able to make use of the new alias filter functionality that
will be in 0.17:

https://github.com/elasticsearch/elasticsearch/issues/971

clint


(Shay Banon) #4

After things have settles (cluster health is green), then you should not get high CPU load. I just did a quick test on my machine, and I get 0.2% usage with 1000 indices.

The reason why recovery takes time and its resource intensive is mainly due to the fact that for each shard, the transaction log is replayed (elasticsearch does not require commit to be issued for data to be "safe"). You can issue a flush to flush the transaction log before you shutdown, and then there won't be anything to replay.

On Thursday, June 9, 2011 at 5:23 PM, maho wrote:

Thanks for your answer.

I've already set the shards per index to 1.
Currently I'm evaluating solr and elasticsearch.
In Solr, the startup with 1000 indices takes 1 or 2 minutes and after
startup the cpu load is 0%.
So I'm a little bit confused why elasticsearch is so resource
intensive.

Why 100 indices? Because every customer has its own index. Using alias
filter could be a solution. But I think that would affect the scoring
of the documents because the ranking algorithm will be calculated on
the basis of all documents (=> inverse document frequency)?!

On 9 Jun., 14:06, Clinton Gormley <clin...@iannounce.co.uk (http://iannounce.co.uk)> wrote:

Hi Maho

On Thu, 2011-06-09 at 04:26 -0700, maho wrote:

Hi,

i have 1 node with 1000 indices. Each index has 100k documents and 15
fields.
If I start elasticsearch I have a very high cpu load of 100-200%
(4x2,7ghz) for 5 minutes.

After startup the cpu load is about 5-10%. Is that ok?

That sounds pretty normal, for a high number of indices.

Given that each index is small, I'd set each on to have only one primary
shard (number_of_shards) rather than the default 5.

This will improve start-uptime, performance and memory usage.

Also, why do you have so many indices? Could these not be combined?

You may be able to make use of the new alias filter functionality that
will be in 0.17:

https://github.com/elasticsearch/elasticsearch/issues/971

clint


(maho) #5

Now I deleted every index and created 1000 empty indices:

1000 opened indices: 8% CPU load (8 cpu cores, so it will be about
1,00%)
1000 closed indices: 0-1% CPU load (8 cpu cores, so it will be about
0,06%)

The bad thing is, that the queries per second (JMeter test) decreased
by 20%

Is that ok or is something wrong with my installation /
configuration / system?

On 10 Jun., 01:46, Shay Banon shay.ba...@elasticsearch.com wrote:

After things have settles (cluster health is green), then you should not get high CPU load. I just did a quick test on my machine, and I get 0.2% usage with 1000 indices.

The reason why recovery takes time and its resource intensive is mainly due to the fact that for each shard, the transaction log is replayed (elasticsearch does not require commit to be issued for data to be "safe"). You can issue a flush to flush the transaction log before you shutdown, and then there won't be anything to replay.

On Thursday, June 9, 2011 at 5:23 PM, maho wrote:

Thanks for your answer.

I've already set the shards per index to 1.
Currently I'm evaluating solr and elasticsearch.
In Solr, the startup with 1000 indices takes 1 or 2 minutes and after
startup the cpu load is 0%.
So I'm a little bit confused why elasticsearch is so resource
intensive.

Why 100 indices? Because every customer has its own index. Using alias
filter could be a solution. But I think that would affect the scoring
of the documents because the ranking algorithm will be calculated on
the basis of all documents (=> inverse document frequency)?!

On 9 Jun., 14:06, Clinton Gormley <clin...@iannounce.co.uk (http://iannounce.co.uk)> wrote:

Hi Maho

On Thu, 2011-06-09 at 04:26 -0700, maho wrote:

Hi,

i have 1 node with 1000 indices. Each index has 100k documents and 15
fields.
If I start elasticsearch I have a very high cpu load of 100-200%
(4x2,7ghz) for 5 minutes.

After startup the cpu load is about 5-10%. Is that ok?

That sounds pretty normal, for a high number of indices.

Given that each index is small, I'd set each on to have only one primary
shard (number_of_shards) rather than the default 5.

This will improve start-uptime, performance and memory usage.

Also, why do you have so many indices? Could these not be combined?

You may be able to make use of the new alias filter functionality that
will be in 0.17:

https://github.com/elasticsearch/elasticsearch/issues/971

clint


(maho) #6

I now installed ES on my windows machine and the problems didn't
appear.
And I didn't get the DEBUG messages like in linux (used same config on
both system):

...
[2011-06-11 19:15:06,007[DEBUG][gateway.local] [Buzz] [core0][0]:
throttling allocation [[core0][0], node[null]], [P], s[UNASSIGNED]] to
[[Buzz][bfMpEcLeS-qBG8tT_QN_sw][inet[/10.0.2.15:9300]]] on primary
allocation

[2011-06-11 19:15:06,784[DEBUG][gateway.local] [Buzz] [core1][0]:
throttling allocation [[core1][0], node[null]], [P], s[UNASSIGNED]] to
[[Buzz][bfMpEcLeS-qBG8tT_QN_sw][inet[/10.0.2.15:9300]]] on primary
allocation
...

On 10 Jun., 14:04, maho mathias.hod...@gmail.com wrote:

Now I deleted every index and created 1000 empty indices:

1000 opened indices: 8% CPU load (8 cpu cores, so it will be about
1,00%)
1000 closed indices: 0-1% CPU load (8 cpu cores, so it will be about
0,06%)

The bad thing is, that the queries per second (JMeter test) decreased
by 20%

Is that ok or is something wrong with my installation /
configuration / system?

On10Jun., 01:46, Shay Banon shay.ba...@elasticsearch.com wrote:

After things have settles (cluster health is green), then you should not get high CPU load. I just did a quick test on my machine, and I get 0.2% usage with 1000 indices.

The reason why recovery takes time and its resource intensive is mainly due to the fact that for each shard, the transaction log is replayed (elasticsearch does not require commit to be issued for data to be "safe"). You can issue a flush to flush the transaction log before you shutdown, and then there won't be anything to replay.

On Thursday, June 9, 2011 at 5:23 PM, maho wrote:

Thanks for your answer.

I've already set the shards per index to 1.
Currently I'm evaluating solr and elasticsearch.
In Solr, the startup with 1000 indices takes 1 or 2 minutes and after
startup the cpu load is 0%.
So I'm a little bit confused why elasticsearch is so resource
intensive.

Why 100 indices? Because every customer has its own index. Using alias
filter could be a solution. But I think that would affect the scoring
of the documents because the ranking algorithm will be calculated on
the basis of all documents (=> inverse document frequency)?!

On 9 Jun., 14:06, Clinton Gormley <clin...@iannounce.co.uk (http://iannounce.co.uk)> wrote:

Hi Maho

On Thu, 2011-06-09 at 04:26 -0700, maho wrote:

Hi,

i have 1 node with 1000 indices. Each index has 100k documents and 15
fields.
If I start elasticsearch I have a very high cpu load of 100-200%
(4x2,7ghz) for 5 minutes.

After startup the cpu load is about 5-10%. Is that ok?

That sounds pretty normal, for a high number of indices.

Given that each index is small, I'd set each on to have only one primary
shard (number_of_shards) rather than the default 5.

This will improve start-uptime, performance and memory usage.

Also, why do you have so many indices? Could these not be combined?

You may be able to make use of the new alias filter functionality that
will be in 0.17:

https://github.com/elasticsearch/elasticsearch/issues/971

clint


(Shay Banon) #7

There isn't a difference between windows and linux in this case. The messages you see are elasticsearch throttling the concurrent allocation of shards on the same node so it won't be overloaded (and become unusable). By default, it allows for 4 concurrent allocations per node.

On Saturday, June 11, 2011 at 8:21 PM, maho wrote:

I now installed ES on my windows machine and the problems didn't
appear.
And I didn't get the DEBUG messages like in linux (used same config on
both system):

...
[2011-06-11 19:15:06,007[DEBUG][gateway.local] [Buzz] [core0][0]:
throttling allocation [[core0][0], node[null]], [P], s[UNASSIGNED]] to
[[Buzz][bfMpEcLeS-qBG8tT_QN_sw][inet[/10.0.2.15:9300]]] on primary
allocation

[2011-06-11 19:15:06,784[DEBUG][gateway.local] [Buzz] [core1][0]:
throttling allocation [[core1][0], node[null]], [P], s[UNASSIGNED]] to
[[Buzz][bfMpEcLeS-qBG8tT_QN_sw][inet[/10.0.2.15:9300]]] on primary
allocation
...

On 10 Jun., 14:04, maho <mathias.hod...@gmail.com (http://gmail.com)> wrote:

Now I deleted every index and created 1000 empty indices:

1000 opened indices: 8% CPU load (8 cpu cores, so it will be about
1,00%)
1000 closed indices: 0-1% CPU load (8 cpu cores, so it will be about
0,06%)

The bad thing is, that the queries per second (JMeter test) decreased
by 20%

Is that ok or is something wrong with my installation /
configuration / system?

On10Jun., 01:46, Shay Banon <shay.ba...@elasticsearch.com (http://elasticsearch.com)> wrote:

After things have settles (cluster health is green), then you should not get high CPU load. I just did a quick test on my machine, and I get 0.2% usage with 1000 indices.

The reason why recovery takes time and its resource intensive is mainly due to the fact that for each shard, the transaction log is replayed (elasticsearch does not require commit to be issued for data to be "safe"). You can issue a flush to flush the transaction log before you shutdown, and then there won't be anything to replay.

On Thursday, June 9, 2011 at 5:23 PM, maho wrote:

Thanks for your answer.

I've already set the shards per index to 1.
Currently I'm evaluating solr and elasticsearch.
In Solr, the startup with 1000 indices takes 1 or 2 minutes and after
startup the cpu load is 0%.
So I'm a little bit confused why elasticsearch is so resource
intensive.

Why 100 indices? Because every customer has its own index. Using alias
filter could be a solution. But I think that would affect the scoring
of the documents because the ranking algorithm will be calculated on
the basis of all documents (=> inverse document frequency)?!

On 9 Jun., 14:06, Clinton Gormley <clin...@iannounce.co.uk (http://iannounce.co.uk) (http://iannounce.co.uk)> wrote:

Hi Maho

On Thu, 2011-06-09 at 04:26 -0700, maho wrote:

Hi,

i have 1 node with 1000 indices. Each index has 100k documents and 15
fields.
If I start elasticsearch I have a very high cpu load of 100-200%
(4x2,7ghz) for 5 minutes.

After startup the cpu load is about 5-10%. Is that ok?

That sounds pretty normal, for a high number of indices.

Given that each index is small, I'd set each on to have only one primary
shard (number_of_shards) rather than the default 5.

This will improve start-uptime, performance and memory usage.

Also, why do you have so many indices? Could these not be combined?

You may be able to make use of the new alias filter functionality that
will be in 0.17:

https://github.com/elasticsearch/elasticsearch/issues/971

clint


(system) #8