Problems with mapping update altering all historical data

(Sean Funk) #1

HI there,

We're using Elasticsearch/Kibana 5.2.2 in our production cluster and I attempted to make a production change to the data starting with a new index. This was a change from treating a random text field from an object, to parsing the valid objects as json bodies, into sub-fields.

The logic I was using was faulty, and for whatever reason, we started losing the field we wanted to alter. I went ahead and backed the change out, deleted the index that the code change created, and rolled back to the previous piece of code that did not have the processing or mapping change.

However, it appears that all of the 30 days dated indices have been altered, at least from Kibana's perspective. Strangely enough, the change I expected to happen did not occur until after I rolled back to the code which did not have the change. When the data started flowing again, I saw that the particular json body field was now being analyzed by Elasticsearch instead of just treated as an unindexed object, as it had been doing for over a year prior (including all of the existing daily dated indices that are still in the cluster)

I'm sort of at wits end with this as I have no idea what happened, and there appear to be no way to recover from this state. The raw json document shows that the field exists, but somehow, Kibana is still breaking the field into subfields, even though it doesn't show up in the Management tab at all. I tried standin gup another kibana instance, fresh, and it exhibits the same behavior - so I'm guessing, somehow, all of the indices have been affected by this change. That's the part I don't understand.

I'm sure this sounds rambling, but I've been looking at this problem for several hours and I don't know what else to do, so I'm trying to get some help.

To be clear, my process was this:

  1. Stop our ETL process immediately after a new, old-style index was created and mapped;
  2. Delete new index.
  3. Roll out code change (this includes new logic to move bad json bodies into a non-analyzed field, keeping old field contents the same if they were real json objects, and change the mapping for th existing field to 'text' - this appeared to work in the testing I performed with the reindex feature)
  4. Start data ETL process.
  5. Watch data fill
  6. Validate in Kibana

I encountered the problem after step 4. Unfortunately, the problem now extends to all of the indices that the code change didn't touch, and that's the part I don't understand. I'm not sure how that change has persisted in Elasticsearch, even after I deleted the index that presented the mapping change. To be clear, it's part of the same index pattern (indexname-YYYYMM-DD).

To Rollback, I performed the following:

  1. Stopped ETL.
  2. Deleted new index with new mapping
  3. Rolled back the code change to a previous version, which verifiably does not have the mapping change
  4. Via Restlet Client, verified that the new index is mapped as expected.
  5. Verified that now the old data is broken in Kibana.

Any ideas on what I did wrong to break all of the historical data?

(system) #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.