We have some ruby scripts that aggregate and re-index data in ES. Whenever we make changes to these scripts we first run them locally to make sure we didn't break anything and that data is correctly indexed. I sometimes would remove my dev indices to keep them clean and aid my testing.
I totally didn't think about the fact that mapping types are now deprecated and removed my dev indices. Now I'm unable to run the scripts for testing purposes as I get "Rejecting mapping update to [dev] as the final mapping would have more than 1 type: [journal, account, order]".
Is there anything I can do to replicate the mapping we still have in production?
I thought I read somewhere that multiple mapping types created prior to 6 would still work but I forgot that every month we create new indices which with our prod ES upgraded to version 6 will break anyway. I better get reindexing then. Thanks for help.
Thanks for the info. I've come across another problem when re-indexing and I didn't see it covered anywhere. I googled for some solutions but they don't quite cover my use case or can't get them to work.
Our mapping has changed over the last few months, for example, we used to have "order_value_USD" field but we then changed it to "value_USD" and so some documents have "order_value_USD" and some "value_USD". When trying to re-index documents I'm getting "dynamic introduction of [order_value_USD] within [order] is not allowed".
What is the best and most time efficient way to re-index old data in a way that it matches new mapping?
Also, when trying to re-index I keep getting 504s - "Gateway Time-out".
I think the above assumes that you have order_value_USD defined for all documents that the reindex runs on, so it will only work on that subset of documents. I can get more details during the work week.
There's a short section mentioning painless scripting in Reindex API docs which can give you an example of how it can work. The Removal of Mapping Types doc also has examples of the script. Basically you'll need to tailor it to your situation.
I'll need to go through alphabet but I can cope with that Also, I removed
"conflicts": "proceed" and "op_type": "create" as I was getting version conflicts.
For the timeouts you should set wait_for_completion=false in your reindex query and monitor the execution of your tasks through the task API: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html#docs-reindex-task-api
If your reindex requests take some time to finish then the client connection can fail because it didn't receive any data before completion. This is why you need to perform this asynchronously, send the reindex request to es and then monitor the completion with another query.
Regarding the null values, can you try with a single document (check the original source and the new one) to see if your script works. And if it's not you can paste the full recreation here (the original document + the reindex request).
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.