Moving data from AWS elasticsearch to elastic.co


(Vadiraj Bidarahalli) #1

Currently we are running elasticsearch 5.5 on AWS elasticsearch.. We want to move to elastic.co.

The size of the data was around 28 GB. We deleted around 6600 K docs using delete by query.
But the size remained the same at 28GB. So no point in taking backup and restoring in new index on elastic.co since the backup would bring along with it the junk documents.

So we are planning to do a reindex.

Can i separately do _reindex type by type like below...

POST _reindex
{
  "source": {
    "index": "oldindex",
    "type": "oldtype1"
  },
  "dest": {
    "index": "new_index"
  }
}
 
 
and then
 
 
POST _reindex
{
  "source": {
    "index": "oldindex",
    "type": "oldtype2"
  },
  "dest": {
    "index": "new_index"
  }
}

or do i have to specify type in dest also?

Regards
Vadiraj


(Mark Walkom) #2

Did you run a force merge to flush them out?


(Vadiraj Bidarahalli) #3

I read that forcemerge is not recommended on active node... in fact, we should allow elasticsearch to naturally remove them is what I read somewhere... so, I thought of reindexing...

does my reindexing method work? type by type


(Mark Walkom) #4

On an active index, yes.

Yes.


(swarmee.net) #5

The deleted documents should

Depends what you mean by active node. If you create new indices each month / day - and old indices are no longer updated or added to then forcemerging is fine. Noting it can take a long time.

Alternatively if you have active indexes deleted documents should be removed automagically by elastic as new documents are indexed over time (force merging is not recommended for these indices).


(Vadiraj Bidarahalli) #6

thank you


(Vadiraj Bidarahalli) #7

One more question.. can my application be running and the source elasticsearch domain in use, while reindexing is going on? Will it affect reindexing process or source data if any requests came in during reindexing?


(swarmee.net) #8

Yes you can continue using the existing index when reindexing it into another index. If you read the reindexing documentation it says it takes a snapshot of the existing index when you kick off the reindexing. So what that means is changes made after you start the reindexing will not be brought across.

What you can do is pull all of the current documents across. Then perform another quick reindexing with a query that just targets the changes since the first run - when you are ready to migrate.


(system) #9

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.