Shards getting bigger with updates (same number of documents)

Hi all

I have a ES 7 cluster of couple of EC2 instances, the index has 6 shards and each shard has 2 replicas (so 1+2 x 6 = 18 shards for an index). When I create the index each shard size is around 25-30gb and we hold around 3mln of records in the database. We have a bit of updates happening everyday, let's say it's around 1mln, the update means the record gets replaced by a new one but the ID stays the same - we have pretty much the same amount of documents. I've noticed that after couple of weeks the shard size grows to 50gb so nearly double the size. Could someone please explain to me why this is happening and how can I fix it? (or should I fix it?) I've noticed search performance going down when we reach 50gb shards. Any comments/help would be highly appreciated.

Thanks

marcineq

Welcome to our community! :smiley:

What is the output from the _cat/indices?v API?

Thanks Mark

    index_name                    2     p      STARTED 46750788    50gb 11.11.11.111 ip-11.11.11.111-es
    index_name                    2     r      STARTED 46750788  49.3gb 22.22.22.222 ip-22.22.22.222-es
    index_name                    2     r      STARTED 46750788  44.4gb 33.33.33.333 ip-33.33.33.333-es
    index_name                    1     p      STARTED 46532522  47.9gb 44.44.44.444 ip-44.44.44.444-es
    index_name                    1     r      STARTED 46532522  52.7gb 55.55.55.555 ip-55.55.55.555-es
    index_name                    1     r      STARTED 46532522    49gb 66.66.66.666 ip-66.66.66.666-es
    index_name                    3     r      STARTED 46677577    52gb 11.11.11.111 ip-11.11.11.111-es
    index_name                    3     p      STARTED 46677577  47.5gb 55.55.55.555 ip-55.55.55.555-es
    index_name                    3     r      STARTED 46677577  44.4gb 77.77.77.777 ip-77.77.77.777-es
    index_name                    5     p      STARTED 46736104  50.8gb 88.88.88.888 ip-88.88.88.888-es
    index_name                    5     r      STARTED 46736104  52.8gb 99.99.99.999 ip-99.99.99.999-es
    index_name                    5     r      STARTED 46736104    48gb 66.66.66.666 ip-66.66.66.666-es
    index_name                    4     p      STARTED 46660338  45.7gb 77.77.77.777 ip-77.77.77.777-es
    index_name                    4     r      STARTED 46660338  49.6gb 88.88.88.888 ip-88.88.88.888-es
    index_name                    4     r      STARTED 46660338  46.8gb 99.99.99.999 ip-99.99.99.999-es
    index_name                    0     r      STARTED 46504385    43gb 44.44.44.444 ip-44.44.44.444-es
    index_name                    0     r      STARTED 46504385  53.3gb 22.22.22.222 ip-22.22.22.222-es
    index_name                    0     p      STARTED 46504385    51gb 33.33.33.333 ip-33.33.33.333-es

That doesn't look aligned with what you are suggesting, there's no deleted documents showing.

Sorry, maybe I explained it incorrectly - if I have document with _id x, an update comes it, es.index is performed with an _id x which replaces the doc x which already exists in the ES db. This operation happens to around 1mln records per day, I have 3mln records in total.

You explained it correctly, but that index is not showing any deleted documents based on the output you provided.

Is that based on the same doc count for the shards? the updates happened in the morning and everything is up to date now across primaries/replicas.

It's based on what I can see from the output from the _cat command you ran. By default, it should show the number of deleted docs directly after the number of docs. There's nothing there though?

What version are you on?

7.3.2

And you run _cat/indices?v, exactly that?

Sorry I was looking at the wrong thing - it's getting late now, here you go:

health status index        uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   index_name 23BMWdfBQKukF5AKjORnkA   6   2  279861774     99922075    833.8gb        268.3gb

Elasticsearch does not perform in place updates. Instead data is stored in immutable segments, so updating documents generates new additional segments that take up additional space and the data that was updated is not immediately deleted. It is not until segments are merged in the background that updated documents are removed from disk and this is triggered when the amount of updated documents in a segment exceeds a threshold. Having an index increase in size while updating is therefore expected.

That makes sense, what should be done in this case? Should I increase the number of shards so that I don't get into a situation where the shard gets over the recommended size? Is there a way to trigger a merge?

You can use the force merge API to trigger merges and it has a parameter named only_expunge_deletes that may help.