Index significantly larger after reindexing

zombie101 · December 16, 2015, 5:13pm

I've had to reindex some data in order to change the shard count on the index they were in: I went from having one shard on the old index to four shards on the new index. The old index (1 shard) was 38.1GB; the new index (4 shards) is 46.7GB; both index have the same number of documents. Is there any reason why just going from one shard to four would cause such a large increase in size (I did expect some per-shard overhead)? Both indices have are using the same template and field mappings.

This is on ES 2.0.

Thanks,
John Ouellette

warkolm · December 17, 2015, 3:49am

Probably doc values and a the re-sharding causing an increase in relative cardinality, which means compression won't be as good.

zombie101 · December 17, 2015, 4:58am

Hi Mark -- I understand the words, but not when they are put together like that Could you explain that a bit more? Is a ~20% increase in the size of the index something to expect in this case?

nik9000 · December 17, 2015, 3:34pm

Doc values are a column wise way of storing bits of the document for quick aggregations. They are compressed using greatest common divisor tricks across whole lucene segments. Lucene segments are the immutable chunks that make up the index. All operations in elasticsearch look like foreach shard {foreach segment{doSomeStuff} aggregate} aggregate.

Depends on how you did the reindex. Were there few segments before and many now? If so merging the segments will probably help. You can learn about the file types here and use that knowledge to count the segments.

If there are fewer segments then you are probably hitting a regression caused by worse compression on the doc values. You can figure that out by looking at the sizes of the files.

Its kind of hard to track down size changes, but looking at those files is the way I start. You might find something crazy like "I accidentally turn on term vectors in the new index" and that'll take up 40% more room, easy.

warkolm · December 18, 2015, 8:41am

Thanks Nik! (Sorry I didn't reply, was travelling :))

zombie101 · December 19, 2015, 12:46am

Thanks Mark and Nik -- I'll be doing a bit more reading, but that helps.

Topic		Replies	Views
Larger index size after Elasticsearch reindex Elasticsearch	9	2343	April 12, 2019
I got much more sizes than base index after reindexation! Elasticsearch	13	1304	July 6, 2017
Size on Disk increased after reindexing documents in ES2.2 Elasticsearch	2	804	July 5, 2017
Shards getting bigger with updates (same number of documents) Elasticsearch	14	560	March 16, 2021
ES5 -> ES6 -> ES7, Snapshot Restore, Reindex, Index Size increase Elasticsearch	2	1090	May 7, 2020

Index significantly larger after reindexing

Related topics