As mentioned in a recent Elastic blog, Elasticsearch 2.0 is coming soon, and indices with conflicting field mappings will be incompatible. I did a quick check, and 66% of my indices will not be upgradable. It will probably take a few months to re-index them, so I'm looking/hoping for another solution.
Is there any possibility of forcing a mapping to change? What I would like to do is forcibly change the mapping to resolve the conflicting field mappings, and then re-index only the documents that contain the modified fields. Is there any technical reason why this couldn't be implemented?
You can not only re-index documents that contain conflicting field mappings because the inverted index contains a global field list. If the global field list is touched (e.g. by removing conflicting fields), you have to move the complete index into a new one by using a new Lucene IndexWriter - and that is not much a difference to a complete re-index from a runtime complexity point of view. Maybe it is possible to write special code for certain cases to iterate through low-level Lucene indexes to do some "repair" work by removing fields, but repairing the mapping on ES level is much more simpler and cleaner - that is where the mapping takes place.
Thanks for the reply, Jörg. Well, in my case, I won't be removing fields, just changing the mapping for some fields. For the sake of clarity, I'll use this example: type_a.user_id is an integer and type_b.user_id is a long, and I want to change type_b.user_id to be an integer.
By re-indexing only some documents, what I mean is: exporting all type_b documents that contain a user_id field, deleting those documents from the database, forcibly changing the mapping for type_b.user_id to integer, then indexing the exported documents again. It seems like it should work from a technical perspective, but I admit my knowledge of ES and Lucene internals is minimal. I think that is what you mean by "repairing the mapping on ES level", so I guess we're in agreement.
I think the dynamic mapper is what is hurting me the most. I have a template that contains a mapping with no field conflicts, but occasionally other data slips in that isn't in the template, and the dynamic mapper adds the wrong data types to the mapping. I'm beginning to think that the dynamic mapper is evil and I should turn it off! I previously thought that dynamically indexed data is better than missing data, but now I'm not so sure.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.