Very large segments - Do I need to worry?

We run many dashboards based on our data. As requests were getting slow a year ago we came up with the idea of merging segments in our indices and had a performance boost of around 20%. After visiting an Elastic training program here in Berlin I learned about the downsides of segment reducing. So I stopped the automatic segment reducing since it could lead to unwanted effects.

I started to investigate the current situation. Here is what I found:

Request

GET /_cat/segments?v&index=analytics-migrations-201812&s=size:desc

Response

index                       shard prirep ip            segment generation docs.count docs.deleted    size size.memory committed searchable version compound
analytics-migrations-201812 0     p      172.24.24.191 _np3k      1105616   31439133     10515132     5gb     2185544 true      true       6.6.1   false
analytics-migrations-201812 0     r      172.24.24.192 _x24u      1542414   28059413      8649610   4.4gb     1930234 true      true       6.6.1   false
analytics-migrations-201812 0     r      172.24.24.192 _k36p       937249   22719776      8502457   3.7gb     1648627 true      true       6.6.1   false
analytics-migrations-201812 0     p      172.24.24.191 _x7p7      1549627   20575200      8714796   3.5gb     1549248 true      true       6.6.1   false
analytics-migrations-201812 0     r      172.24.24.192 _14slg     1903300   14213376      3519637   2.1gb      969675 true      true       6.6.1   false
analytics-migrations-201812 0     p      172.24.24.191 _14f9e     1886018   12796404      3549667     2gb      895393 true      true       6.6.1   false
analytics-migrations-201812 0     r      172.24.24.192 _1bekx     2211729    7316133      5329274   1.5gb      698780 true      true       6.6.1   false
analytics-migrations-201812 0     p      172.24.24.191 _19yah     2143961    3840324      4702690     1gb      499394 true      true       6.6.1   false
analytics-migrations-201812 0     p      172.24.24.191 _1bekj     2211715    4121995      2634361 855.9mb      407861 true      true       6.6.1   true
analytics-migrations-201812 0     p      172.24.24.191 _1dsm1     2323225    4185331        39887 528.4mb      279747 true      true       6.6.1   true
analytics-migrations-201812 0     r      172.24.24.192 _1dg86     2307174    2341967       657786 374.3mb      215218 true      true       6.6.1   true
analytics-migrations-201812 0     p      172.24.24.191 _1cypu     2284482    1523517       226437 217.4mb      124784 true      true       6.6.1   true
...

The index "analytics-migrations-201812" consists of around 85 segments.

As one can see there are some segments which are huge holding many deleted docs. Some of them are very large but with compound=false. I wonder about the compound flag. After looking up the meaning of it I am not sure what it actually tells. Only the huge ones are tagged with "compound=false".

Is there a potential problem? Do I need to take action? What could be done if so?

Thank you.

It looks like you are using time-based indices, which usually means retention is managed by simply dropping the full index. How come you are deleting documents from them?

The "analytics-migration-YYYYMM" indices hold documents (events) which can be updated based on new events which can have a relation to each other. That's how all the deleted documents get generated. We know that's not ideal but we need to read the relations from one document to the other if there is one.

As far as I know the root cause of the mistake was segment reducing of read-write-indices.

We cannot simply delete the index because our dashboards will query for the data.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.