Kazama  
                
               
                 
              
                  
                    March 30, 2017,  9:05pm
                   
                   
              1 
               
             
            
              The only changes I've made to my indexes migrating from ES 2.4.4 to ES 5.3.0 are such mappings upgrade (rest of the fields are compatible with 5.x ):
{ x: string, index: not_analyzed } => { x: keyword }
{ x: string, term_vector: yes } => { x: text, term_vector: yes }
 
Reindexing from scratch I've got such stats:
dataset1=560.392 docs. 
ES 2.4.4 index size=99.7M 
ES 5.3.0 index size=104M
dataset2=2.583.604 docs 
ES 2.4.4 index size=623M 
ES 5.3.0 index size=662M
Is it a general rule: 5.x index size is larger than 2.x one for the same docs?
Mby it matters : 2.4.4 comes from the official deb repository, 5.3.0 comes from the official docker image (I mean how they were configured etc).
             
            
               
               
               
            
            
           
          
            
              
                dadoonet  
                (David Pilato)
               
              
                  
                    March 31, 2017,  5:10am
                   
                   
              2 
               
             
            
              A first guess is doc values. 
They are generated when you have keyword type. It's not identical to not_analyzed actually.
             
            
               
               
               
            
            
           
          
            
            
              I recall more data has been migrated to doc_values in 5.x compared to 2.x, which means that the index size in 5.x, depending on your mappings, may take up a bit more space. When I tested it on a sample data set a while back I think it was in the range of 3-5%, but your milage may vary.
             
            
               
               
               
            
            
           
          
            
              
                Kazama  
                
               
              
                  
                    March 31, 2017,  9:49am
                   
                   
              4 
               
             
            
              
All relevant fields have doc_values explicitly disabled or enabled, no changes were made to this during migration.
 Christian_Dahlqvist:
 
I recall more data has been migrated to doc_values in 5.x compared to 2.x, which means that the index size in 5.x, depending on your mappings, may take up a bit more space. When I tested it on a sample data set a while back I think it was in the range of 3-5%, but your milage may vary.
 
 
Thanks, I guess that's the reason. I have some fields with doc_values enabled indeed.
             
            
               
               
               
            
            
           
          
            
            
              Have you run a force merge on these indices to ensure they have the same number of segments?
             
            
               
               
               
            
            
           
          
            
              
                Kazama  
                
               
              
                  
                    March 31, 2017, 10:49am
                   
                   
              6 
               
             
            
              I've just figured out indexing on 2.4.4 was done into 1 shard and it was done into 6 shards on 5.3.0.
After 5.3.0 reindexing into 1 shard and  doing _forcemerge here are the updated stats:
dataset1=560.392 docs. 
ES 2.4.4 index size=99.7M 
ES 5.3.0 index size=100M
dataset2=2.583.604 docs 
ES 2.4.4 index size=623M 
ES 5.3.0 index size=630M
5.3.0 is a bit higher still but the difference is really small.
Thanks for your suggestions!
             
            
               
               
               
            
            
           
          
            
              
                system  
                (system)
                  Closed 
               
              
                  
                    April 28, 2017, 10:50am
                   
                   
              7 
               
             
            
              This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.