How to test application with multiple mapping types after their deprecation?

Kornelia_Watson · January 2, 2018, 8:20am

Hi,

We have some ruby scripts that aggregate and re-index data in ES. Whenever we make changes to these scripts we first run them locally to make sure we didn't break anything and that data is correctly indexed. I sometimes would remove my dev indices to keep them clean and aid my testing.

I totally didn't think about the fact that mapping types are now deprecated and removed my dev indices. Now I'm unable to run the scripts for testing purposes as I get "Rejecting mapping update to [dev] as the final mapping would have more than 1 type: [journal, account, order]".

Is there anything I can do to replicate the mapping we still have in production?

Julien · January 2, 2018, 10:58am

Is prod and are the indices on version 5 ? If so, please look at the snapshot API here (I have not tested this but restoring snapshot of version 5 is supported for version 6):
https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-snapshots.html

Note that if you use version 6, you should reindex to new indices with a single type as any index created in version 6 cannot have multiple types, a few ways to do this are documented here:
https://www.elastic.co/guide/en/elasticsearch/reference/6.x/removal-of-types.html

Kornelia_Watson · January 2, 2018, 11:16am

I thought I read somewhere that multiple mapping types created prior to 6 would still work but I forgot that every month we create new indices which with our prod ES upgraded to version 6 will break anyway. I better get reindexing then. Thanks for help.

a5a · January 3, 2018, 3:25am

@Kornelia_Watson
https://www.elastic.co/guide/en/elasticsearch/reference/6.x/removal-of-types.html#removal-of-types If you click here, you'll see the sentence that explains exactly what you said: that multiple mapping types in indices created <6 work.

Yeah, that makes sense, your newly created indices with your template will fail after you upgrade to 6. I'm working on a tool that will automate reindexing indices with multiple types into single-type indices, but until then, you can watch this webinar where I describe two strategies for solving the problem: https://www.elastic.co/webinars/upgrading-your-elastic-stack, and this article which explains a third strategy: https://www.elastic.co/blog/kibana-6-removal-of-mapping-types

Kornelia_Watson · January 6, 2018, 11:35am

Thanks for the info. I've come across another problem when re-indexing and I didn't see it covered anywhere. I googled for some solutions but they don't quite cover my use case or can't get them to work.

Our mapping has changed over the last few months, for example, we used to have "order_value_USD" field but we then changed it to "value_USD" and so some documents have "order_value_USD" and some "value_USD". When trying to re-index documents I'm getting "dynamic introduction of [order_value_USD] within [order] is not allowed".

What is the best and most time efficient way to re-index old data in a way that it matches new mapping?

Also, when trying to re-index I keep getting 504s - "Gateway Time-out".

a5a · January 7, 2018, 9:34pm

You'll have to reindex with a script that transforms the old order_value_USD values into value_USD values.

ctx._source.value_USD = ctx._source.order_value_USD

I think the above assumes that you have order_value_USD defined for all documents that the reindex runs on, so it will only work on that subset of documents. I can get more details during the work week.

There's a short section mentioning painless scripting in Reindex API docs which can give you an example of how it can work. The Removal of Mapping Types doc also has examples of the script. Basically you'll need to tailor it to your situation.

Kornelia_Watson · January 9, 2018, 10:55am

Hi, so I tried reindexing query with your suggested script as well as with adding the "remove" bit, i.e.

"ctx._source.value_USD = ctx._source.remove('order_value_USD'); "

like suggested in Reindex API | Elasticsearch Guide [master] | Elastic

So my overall query was:

POST _reindex
{
"conflicts": "proceed",
"source": {
"index": "insights_prod-2017.07",
"type": "order"
},
"dest": {
"index": "orders_prod-2017.07",
"op_type": "create"
},
"script": {
"lang": "painless",
"source":
"ctx._source.value_USD = ctx._source.remove('order_value_USD');
ctx._source.value = ctx._source.remove('order_value');
ctx._source.net_value_excl_shipping_USD = ctx._source.remove('order_net_value_excl_shipping_USD');
ctx._source.net_value_excl_shipping = ctx._source.remove('order_net_value_excl_shipping');
ctx._source.invoiced_timestamp = ctx._source.remove('order_invoiced_timestamp');
ctx._source.fully_shipped_timestamp = ctx._source.remove('order_fully_shipped_timestamp');
ctx._source.remove('account_is_subscriber');
ctx._source.remove('item_count_stock_tracked');
ctx._source.id = ctx._source.remove('order_id');
ctx._source.contact_id = ctx._source.remove('contact_email');"
}
}

So, all the missing documents got created, but all the values mentioned above are null even though they exist in original docs.

Any ideas what I'm doing wrong?

Also, any ideas what the deal is with

{
"statusCode": 504,
"error": "Gateway Time-out",
"message": "Client request timeout"
}

I tried adding

?wait_for_completion=true

but it didn't change anything

Kornelia_Watson · January 10, 2018, 8:05am

I managed to avoid a timeout by filtering which accounts to reindex.

"query": {
"bool": {
"must": [
{
"query_string": {
"query": "account_id: a*",
"analyze_wildcard": true
}
}
]
}
}

I'll need to go through alphabet but I can cope with that Also, I removed
"conflicts": "proceed" and "op_type": "create" as I was getting version conflicts.

Still no sign of missing values though

jimczi · January 11, 2018, 2:50pm

For the timeouts you should set wait_for_completion=false in your reindex query and monitor the execution of your tasks through the task API:
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html#docs-reindex-task-api
If your reindex requests take some time to finish then the client connection can fail because it didn't receive any data before completion. This is why you need to perform this asynchronously, send the reindex request to es and then monitor the completion with another query.
Regarding the null values, can you try with a single document (check the original source and the new one) to see if your script works. And if it's not you can paste the full recreation here (the original document + the reindex request).

system · February 8, 2018, 2:50pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.