Merging too heavy - causing very slow response times


(Clinton Gormley) #1

Hiya

We're using 0.17.8 in production. About a week ago I changed from
storing our web sessions in our DB to storing them in ElasticSearch.

We have about 50m sessions, stored in an index with 1 primary shard and
1 replica, which amounts to about 11GB of data.

All was working well, then today we suddenly had 3 minutes of requests
to the session index timing out (>30 seconds).

The node with the primary shard had high loads and was in the process of
doing a merge which lasted about 5 minutes.

Is there anything I should tweak to make these merges lighter?

thanks

Clint


(Shay Banon) #2

Was there just a single merge running, or several of those? If its a single merge, then only the feature of throttling the actual merge in upcoming 4.0 will allow that… . If there were other merges going, we can configure things so less merges will run concurrently.

On Tuesday, March 13, 2012 at 6:27 PM, Clinton Gormley wrote:

Hiya

We're using 0.17.8 in production. About a week ago I changed from
storing our web sessions in our DB to storing them in ElasticSearch.

We have about 50m sessions, stored in an index with 1 primary shard and
1 replica, which amounts to about 11GB of data.

All was working well, then today we suddenly had 3 minutes of requests
to the session index timing out (>30 seconds).

The node with the primary shard had high loads and was in the process of
doing a merge which lasted about 5 minutes.

Is there anything I should tweak to make these merges lighter?

thanks

Clint


(Clinton Gormley) #3

On Wed, 2012-03-14 at 14:18 +0200, Shay Banon wrote:

Was there just a single merge running, or several of those? If its a
single merge, then only the feature of throttling the actual merge in
upcoming 4.0 will allow that… . If there were other merges going, we
can configure things so less merges will run concurrently.

There was just a single merge happening on that index. I'm not sure
about any of the other indices.

But it looked like a biggie: the index dir shrank by 6GB once it was
done.

Nothing I can control with maximum segment size or anything?

ta

clint

On Tuesday, March 13, 2012 at 6:27 PM, Clinton Gormley wrote:

Hiya

We're using 0.17.8 in production. About a week ago I changed from
storing our web sessions in our DB to storing them in ElasticSearch.

We have about 50m sessions, stored in an index with 1 primary shard
and
1 replica, which amounts to about 11GB of data.

All was working well, then today we suddenly had 3 minutes of
requests
to the session index timing out (>30 seconds).

The node with the primary shard had high loads and was in the
process of
doing a merge which lasted about 5 minutes.

Is there anything I should tweak to make these merges lighter?

thanks

Clint


(Shay Banon) #4

Yes, you can definitely tweak those settings in the tiered merge policy (you can configure max segment to merge size), so reduce the changes of very large merges running. Note though, be careful not to end up with too many segments (the segments API is your friend here), and possibly schedule explicit optimize on quiet times.

On Wednesday, March 14, 2012 at 2:22 PM, Clinton Gormley wrote:

On Wed, 2012-03-14 at 14:18 +0200, Shay Banon wrote:

Was there just a single merge running, or several of those? If its a
single merge, then only the feature of throttling the actual merge in
upcoming 4.0 will allow that… . If there were other merges going, we
can configure things so less merges will run concurrently.

There was just a single merge happening on that index. I'm not sure
about any of the other indices.

But it looked like a biggie: the index dir shrank by 6GB once it was
done.

Nothing I can control with maximum segment size or anything?

ta

clint

On Tuesday, March 13, 2012 at 6:27 PM, Clinton Gormley wrote:

Hiya

We're using 0.17.8 in production. About a week ago I changed from
storing our web sessions in our DB to storing them in ElasticSearch.

We have about 50m sessions, stored in an index with 1 primary shard
and
1 replica, which amounts to about 11GB of data.

All was working well, then today we suddenly had 3 minutes of
requests
to the session index timing out (>30 seconds).

The node with the primary shard had high loads and was in the
process of
doing a merge which lasted about 5 minutes.

Is there anything I should tweak to make these merges lighter?

thanks

Clint


(system) #5