Do I need to manually copy over mappings before a reindex operation?

Hi!

I am trying to reindex multiple indexes into one.
Right now, My code looks like this.

  1. Create the destination index with settings.number_of_replicas = 0.
  2. Iterate over source index one by one and reindex each index into the destination index.

All source indexes are guaranteed to have the same mapping. Right now, I am not copying over mapping of one of the source index(since they are all the same) to destination index and this seems to be working fine.

However, I am wondering if this is something I should do because I think it may help in better handling errors? I am not really a 100% sure. :confused:

Here is what the reindex operation looks like,

curl -XPOST "http://elasticsearch:9200/_reindex?requests_per_second=115&wait_for_completion=true" -H 'Content-Type: application/json' -d'
{
  "source": {
    "index": "analytics-prod-2019.12.30",
    "size": 1000
  },
  "dest": {
    "index": "analytics-prod-2019.12"
  },
  "script": {
    "lang": "painless",   
    "source": "      ctx._source.index = ctx._index;      def eventData = ctx._source[\"event.data\"];      if (eventData != null) {        eventData.remove(\"realmDb.size\");       eventData.remove(\"realmDb.format\");        eventData.remove(\"realmDb.contents\");      }"
  }
}'

Reindexing operations don't copy any mappings, only the documents, so you need to either copy the mappings manually or use an index template to apply the desired mappings when the destination index is created.

Okay, Thanks for responding!

My main objective here is to understand if I really should setting up the mapping before reindex operation or use index templates.

Right now, I am not doing any of these and I don't really notice any errors except for one issue.

I had exported a index in ndjson format. I divided the exported file into two chunks. First one contains 4000 docs, second contains 8000 docs.

Then, I saved both of these chunks back into elasicsearch using the bulk API(as two new different indexes) and fetched their mapping.

And this is the result,

Diff result
--- 4000	2020-04-02 23:09:36.447457810 +0530
+++ 8000	2020-04-02 23:09:22.454024286 +0530
@@ -1,5 +1,5 @@
 {
-    "analytics-2019.12.25": {
+    "analytics-2019.12.26": {
         "mappings": {
             "properties": {
                 "app": {
@@ -190,6 +190,9 @@
                                         }
                                     }
                                 },
+                                "cancelled": {
+                                    "type": "boolean"
+                                },
                                 "card": {
                                     "properties": {
                                         "index": {
@@ -382,6 +385,19 @@
                                 },
                                 "intent": {
                                     "properties": {
+                                        "playlist": {
+                                            "properties": {
+                                                "id": {
+                                                    "type": "text",
+                                                    "fields": {
+                                                        "keyword": {
+                                                            "type": "keyword",
+                                                            "ignore_above": 256
+                                                        }
+                                                    }
+                                                }
+                                            }
+                                        },
                                         "searchText": {
                                             "type": "text",
                                             "fields": {
@@ -708,6 +724,15 @@
                                 "trackId": {
                                     "type": "long"
                                 },
+                                "type": {
+                                    "type": "text",
+                                    "fields": {
+                                        "keyword": {
+                                            "type": "keyword",
+                                            "ignore_above": 256
+                                        }
+                                    }
+                                },
                                 "url": {
                                     "type": "text",
                                     "fields": {

So, If I understand correctly, It encountered some fields that existed in the larger sample that didn't exist in the smaller sample but the type of fields in both datasets that share a common name was the same so there weren't any errors.

If I choose to copy over the mapping before performing reindex operation, The question becomes, What index to choose to copy the mapping from? The indexes share a common set of fields but apparently, some docs in those indexes may also have some extra fields. So, If I picked a index which had some extraneous field that doesn't exist in other indexes, Will it return a error? I am still reading up on all of this sooo maybe this'll come up somewhere.

No, it just won't return values for those fields if asked for them. So I would pick the one that has "all" of the fields.

Also I'd suggest looking at using ILM and index templates here, rather than individual mappings :slight_smile:

I suggest you set up the _template ... so u will stop worrying about mapping.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.