Do I need to manually copy over mappings before a reindex operation?

ishanjain · April 1, 2020, 5:57pm

Hi!

I am trying to reindex multiple indexes into one.
Right now, My code looks like this.

Create the destination index with settings.number_of_replicas = 0.
Iterate over source index one by one and reindex each index into the destination index.

All source indexes are guaranteed to have the same mapping. Right now, I am not copying over mapping of one of the source index(since they are all the same) to destination index and this seems to be working fine.

However, I am wondering if this is something I should do because I think it may help in better handling errors? I am not really a 100% sure.

Here is what the reindex operation looks like,

curl -XPOST "http://elasticsearch:9200/_reindex?requests_per_second=115&wait_for_completion=true" -H 'Content-Type: application/json' -d'
{
  "source": {
    "index": "analytics-prod-2019.12.30",
    "size": 1000
  },
  "dest": {
    "index": "analytics-prod-2019.12"
  },
  "script": {
    "lang": "painless",   
    "source": "      ctx._source.index = ctx._index;      def eventData = ctx._source[\"event.data\"];      if (eventData != null) {        eventData.remove(\"realmDb.size\");       eventData.remove(\"realmDb.format\");        eventData.remove(\"realmDb.contents\");      }"
  }
}'

magnusbaeck · April 2, 2020, 6:14am

Reindexing operations don't copy any mappings, only the documents, so you need to either copy the mappings manually or use an index template to apply the desired mappings when the destination index is created.

ishanjain · April 2, 2020, 5:54pm

Okay, Thanks for responding!

My main objective here is to understand if I really should setting up the mapping before reindex operation or use index templates.

Right now, I am not doing any of these and I don't really notice any errors except for one issue.

I had exported a index in ndjson format. I divided the exported file into two chunks. First one contains 4000 docs, second contains 8000 docs.

Then, I saved both of these chunks back into elasicsearch using the bulk API(as two new different indexes) and fetched their mapping.

And this is the result,

Diff result

--- 4000	2020-04-02 23:09:36.447457810 +0530
+++ 8000	2020-04-02 23:09:22.454024286 +0530
@@ -1,5 +1,5 @@
 {
-    "analytics-2019.12.25": {
+    "analytics-2019.12.26": {
         "mappings": {
             "properties": {
                 "app": {
@@ -190,6 +190,9 @@
                                         }
                                     }
                                 },
+                                "cancelled": {
+                                    "type": "boolean"
+                                },
                                 "card": {
                                     "properties": {
                                         "index": {
@@ -382,6 +385,19 @@
                                 },
                                 "intent": {
                                     "properties": {
+                                        "playlist": {
+                                            "properties": {
+                                                "id": {
+                                                    "type": "text",
+                                                    "fields": {
+                                                        "keyword": {
+                                                            "type": "keyword",
+                                                            "ignore_above": 256
+                                                        }
+                                                    }
+                                                }
+                                            }
+                                        },
                                         "searchText": {
                                             "type": "text",
                                             "fields": {
@@ -708,6 +724,15 @@
                                 "trackId": {
                                     "type": "long"
                                 },
+                                "type": {
+                                    "type": "text",
+                                    "fields": {
+                                        "keyword": {
+                                            "type": "keyword",
+                                            "ignore_above": 256
+                                        }
+                                    }
+                                },
                                 "url": {
                                     "type": "text",
                                     "fields": {

So, If I understand correctly, It encountered some fields that existed in the larger sample that didn't exist in the smaller sample but the type of fields in both datasets that share a common name was the same so there weren't any errors.

If I choose to copy over the mapping before performing reindex operation, The question becomes, What index to choose to copy the mapping from? The indexes share a common set of fields but apparently, some docs in those indexes may also have some extra fields. So, If I picked a index which had some extraneous field that doesn't exist in other indexes, Will it return a error? I am still reading up on all of this sooo maybe this'll come up somewhere.

warkolm · April 2, 2020, 9:03pm

No, it just won't return values for those fields if asked for them. So I would pick the one that has "all" of the fields.

Also I'd suggest looking at using ILM and index templates here, rather than individual mappings

vorapoap · April 18, 2020, 3:17pm

I suggest you set up the _template ... so u will stop worrying about mapping.

system · May 16, 2020, 3:17pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Reindex keep mapping Elasticsearch	5	2183	June 6, 2017
Reindex API doesnt use the templates for destination index Elasticsearch	3	599	July 5, 2017
Reindex api and date field Elasticsearch	6	1760	September 18, 2018
How to change mapping and reindex to new index using mapping with Python Elasticsearch client API? Elasticsearch	4	4503	July 5, 2017
Update mapping on existing index Elasticsearch	2	359	July 6, 2017

Do I need to manually copy over mappings before a reindex operation?

Related topics