Data Correction

I have an index that is comprised of two separate data dumps. Upon further analysis, it was discovered that one of those seems to have had a 1hr shift. This has been partially rectified by using offsets with the visualization tools but I'd like to correct that on the backend.

I can modify the CSV and re-ingest but I'm wondering what it looks like to shift that data using elasticsearch. I suppose I write a GET query to match the documents. But after that is the update performed with a POST command or something similar?

You can do an in place reindex, the concept is the same as this, just ignore the upgrade parts - https://www.elastic.co/guide/en/elasticsearch/reference/7.10/reindex-upgrade-inplace.html :slight_smile:

Sorry for the delay, I'm finally getting back around to this. I checked that out but I don't seem to be able to put it together. I don't think i explained my issue very well.

I have an index where the data is accurate from July 2019 though November 3rd 2019. But from November 3rd 2019 through August 2020, the data is shifted by +1hr. Is there a way for me run an elasticsearch command to shift all the @timestamps for documents in that index for that timerange by -1hr?

Yes, you need to reindex and change that timestamp in the reindex process :slight_smile:

I think I'm missing something and it's probably due to the fact that I'm still pretty noob with elasticsearch. I think what I need help with is the following excerpt from the document you linked:

You can use a script to perform any necessary modifications to the document data and metadata during reindexing.

From what I gather, I can use something like the following to reindex:

POST _reindex
{
   "source":{
      "index":"source",
      "query": {
        "match": {
           "field_name": "text"
         }
      }
   },
   "dest":{
      "index":"destination"
   }
}

But I need to put something more in the destination to transform the timestamp. Can I simply use offset=-1h with the destination to perform that transformation? For example:

   "dest":{
      "index":"destination",
      "offset=-1h"
   }

Not sure where you got that offset from, but I don't believe it's valid.

You will probably want to add a script to do some date math like this;

{
  "script_fields": {
    "new_date_field": {
      "script": {
        "inline": "doc['date_field'].value + 3600"
      }
    }
  }
}

Where 3600 is the number of seconds in an hour to add onto your existing date_field (or whatever it's called).

1 Like

I got the offset because I have no clue what I'm doing with this (or rather I'm still learning) and since I've been using Timelion, I figured I'd demonstrate that I was trying rather than just seeming like I was leeching.

Thank you, I think that's exactly what I need. Further investigation has revealed that I think the problem is with some devices in this dataset not accounting for DST...so this will start to get tricky, but I think with the info you provided, I should be able to figure it out.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.