Replication with filtering

Hi,

I have a use case I am not sure how to cater for.
We have an elasticsearch db that takes a high volume of logged information per second.
We have a requirement to archive some of that volume off.
I am hoping we can archive to a second elasticsearch db.

I know elasticsearch does replication. The question I have is, can it replicate with filtering, to only replicate what we want it to?

Replication is not designed for this need.

But, you have other choices:

  • let say you want to archive by date. Use time based indices and just change index allocation to allocate old indices to archive nodes
  • use reindex from remote: so reindex with a query from one cluster to another. Then use delete by query to remove old data.

May be there are other ways but it depends on the kind of data you want to archive (what is the type of query you want to run basically)

We want to backup the content of specific indices. When writing a query on es I believe there is no longer a general creation timestamp on each document we could use to only grab the incremental?

I am looking at
https://www.npmjs.com/package/elasticdump
as a solution atm.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.