Storage usage when maintaining data over 2 seperate indexes


I am looking into a solution for producing a dedup version of an existing index,
similar to the solution offered here: Little Logstash Lessons: Handling Duplicates | Elastic Blog

I have an index containing all the data and I would like to create an index containing only the latest data,
at some point there will be two documents on two indexes containing the same data, will they take twice the storage space?

any help?

