Kazama
March 30, 2017, 9:05pm
1
The only changes I've made to my indexes migrating from ES 2.4.4 to ES 5.3.0 are such mappings upgrade (rest of the fields are compatible with 5.x ):
{ x: string, index: not_analyzed } => { x: keyword }
{ x: string, term_vector: yes } => { x: text, term_vector: yes }
Reindexing from scratch I've got such stats:
dataset1=560.392 docs.
ES 2.4.4 index size=99.7M
ES 5.3.0 index size=104M
dataset2=2.583.604 docs
ES 2.4.4 index size=623M
ES 5.3.0 index size=662M
Is it a general rule: 5.x index size is larger than 2.x one for the same docs?
Mby it matters : 2.4.4 comes from the official deb repository, 5.3.0 comes from the official docker image (I mean how they were configured etc).
dadoonet
(David Pilato)
March 31, 2017, 5:10am
2
A first guess is doc values.
They are generated when you have keyword type. It's not identical to not_analyzed actually.
I recall more data has been migrated to doc_values in 5.x compared to 2.x, which means that the index size in 5.x, depending on your mappings, may take up a bit more space. When I tested it on a sample data set a while back I think it was in the range of 3-5%, but your milage may vary.
Kazama
March 31, 2017, 9:49am
4
All relevant fields have doc_values explicitly disabled or enabled, no changes were made to this during migration.
Christian_Dahlqvist:
I recall more data has been migrated to doc_values in 5.x compared to 2.x, which means that the index size in 5.x, depending on your mappings, may take up a bit more space. When I tested it on a sample data set a while back I think it was in the range of 3-5%, but your milage may vary.
Thanks, I guess that's the reason. I have some fields with doc_values enabled indeed.
Have you run a force merge on these indices to ensure they have the same number of segments?
Kazama
March 31, 2017, 10:49am
6
I've just figured out indexing on 2.4.4 was done into 1 shard and it was done into 6 shards on 5.3.0.
After 5.3.0 reindexing into 1 shard and doing _forcemerge here are the updated stats:
dataset1=560.392 docs.
ES 2.4.4 index size=99.7M
ES 5.3.0 index size=100M
dataset2=2.583.604 docs
ES 2.4.4 index size=623M
ES 5.3.0 index size=630M
5.3.0 is a bit higher still but the difference is really small.
Thanks for your suggestions!
system
(system)
Closed
April 28, 2017, 10:50am
7
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.