How to safely clean old documents (by date)


(AALISHE) #1

Hi,

I have ES "0.20.3" .. with a single index replicated on 2 servers .. with
5 shards ..* size: 57.2gb / docs: 36060297 *

I have webpages(the docs) indexed since 2013 ... so I want to delete
everything and keep 1 year worth of documents

how can I do this safely on a production setup

am thinking of the following:

1- make a copy of the current index ... put it next to it (with a different
name ofcourse) ... (how do I accomplish this)
2- delete documents before May 2014 from the copied index
3- Rename the old index and leave / Delete it

OR

1- make an empty index
2- pull documents after May2014 from the current index to the new one
*(how do I accomplish this)*3- Rename the old index and leave / Delete it

appreciate your help guys...
thanks!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f420f325-9f60-49d9-a9f6-d56528f32a99%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(David Pilato) #2

Definitely the second option.
Use scan and scroll (search for reindex on the website).

Instead of renaming, I would use aliases and switch the alias from old to new index.

Then close or remove the old index.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 3 mai 2015 à 22:24, AALISHE aalishe@gmail.com a écrit :

Hi,

I have ES "0.20.3" .. with a single index replicated on 2 servers .. with 5 shards .. size: 57.2gb / docs: 36060297

I have webpages(the docs) indexed since 2013 ... so I want to delete everything and keep 1 year worth of documents

how can I do this safely on a production setup

am thinking of the following:

1- make a copy of the current index ... put it next to it (with a different name ofcourse) ... (how do I accomplish this)
2- delete documents before May 2014 from the copied index
3- Rename the old index and leave / Delete it

OR

1- make an empty index
2- pull documents after May2014 from the current index to the new one (how do I accomplish this)
3- Rename the old index and leave / Delete it

appreciate your help guys...
thanks!

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f420f325-9f60-49d9-a9f6-d56528f32a99%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/28864A70-C04D-4845-AF22-7C5EDCB87FB2%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.


(AALISHE) #3

Thanks David!

do you know how I perform step (2) pull documents after May2014 from the
current index to the new one

On Monday, May 4, 2015 at 12:15:22 AM UTC+3, David Pilato wrote:

Definitely the second option.
Use scan and scroll (search for reindex on the website).

Instead of renaming, I would use aliases and switch the alias from old to
new index.

Then close or remove the old index.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 3 mai 2015 à 22:24, AALISHE <aal...@gmail.com <javascript:>> a écrit :

Hi,

I have ES "0.20.3" .. with a single index replicated on 2 servers .. with
5 shards ..* size: 57.2gb / docs: 36060297 *

I have webpages(the docs) indexed since 2013 ... so I want to delete
everything and keep 1 year worth of documents

how can I do this safely on a production setup

am thinking of the following:

1- make a copy of the current index ... put it next to it (with a
different name ofcourse) ... (how do I accomplish this)
2- delete documents before May 2014 from the copied index
3- Rename the old index and leave / Delete it

OR

1- make an empty index
2- pull documents after May2014 from the current index to the new one
*(how do I accomplish this)*3- Rename the old index and leave / Delete
it

appreciate your help guys...
thanks!

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/f420f325-9f60-49d9-a9f6-d56528f32a99%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/f420f325-9f60-49d9-a9f6-d56528f32a99%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9c312789-2492-4bdf-a660-19d542df4977%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(David Pilato) #4

Searching for reindex in docs would have directed you to http://www.elastic.co/guide/en/elasticsearch/guide/current/reindex.html

David

Le 3 mai 2015 à 23:36, AALISHE aalishe@gmail.com a écrit :

Thanks David!

do you know how I perform step (2) pull documents after May2014 from the current index to the new one

On Monday, May 4, 2015 at 12:15:22 AM UTC+3, David Pilato wrote:
Definitely the second option.
Use scan and scroll (search for reindex on the website).

Instead of renaming, I would use aliases and switch the alias from old to new index.

Then close or remove the old index.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 3 mai 2015 à 22:24, AALISHE aal...@gmail.com a écrit :

Hi,

I have ES "0.20.3" .. with a single index replicated on 2 servers .. with 5 shards .. size: 57.2gb / docs: 36060297

I have webpages(the docs) indexed since 2013 ... so I want to delete everything and keep 1 year worth of documents

how can I do this safely on a production setup

am thinking of the following:

1- make a copy of the current index ... put it next to it (with a different name ofcourse) ... (how do I accomplish this)
2- delete documents before May 2014 from the copied index
3- Rename the old index and leave / Delete it

OR

1- make an empty index
2- pull documents after May2014 from the current index to the new one (how do I accomplish this)
3- Rename the old index and leave / Delete it

appreciate your help guys...
thanks!

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f420f325-9f60-49d9-a9f6-d56528f32a99%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9c312789-2492-4bdf-a660-19d542df4977%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/F64B5203-F4A7-4FB0-B34C-44F8D3249D8E%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.


(Mark Walkom) #5

Just a side note, if you are using time based data then it makes a lot of
sense to use time based indices - ie daily, weekly, monthly.

On 4 May 2015 at 08:06, David Pilato david@pilato.fr wrote:

Searching for reindex in docs would have directed you to
http://www.elastic.co/guide/en/elasticsearch/guide/current/reindex.html

David

Le 3 mai 2015 à 23:36, AALISHE aalishe@gmail.com a écrit :

Thanks David!

do you know how I perform step (2) pull documents after May2014 from
the current index to the new one

On Monday, May 4, 2015 at 12:15:22 AM UTC+3, David Pilato wrote:

Definitely the second option.
Use scan and scroll (search for reindex on the website).

Instead of renaming, I would use aliases and switch the alias from old to
new index.

Then close or remove the old index.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 3 mai 2015 à 22:24, AALISHE aal...@gmail.com a écrit :

Hi,

I have ES "0.20.3" .. with a single index replicated on 2 servers ..
with 5 shards ..* size: 57.2gb / docs: 36060297 *

I have webpages(the docs) indexed since 2013 ... so I want to delete
everything and keep 1 year worth of documents

how can I do this safely on a production setup

am thinking of the following:

1- make a copy of the current index ... put it next to it (with a
different name ofcourse) ... (how do I accomplish this)
2- delete documents before May 2014 from the copied index
3- Rename the old index and leave / Delete it

OR

1- make an empty index
2- pull documents after May2014 from the current index to the new one
*(how do I accomplish this)*3- Rename the old index and leave / Delete
it

appreciate your help guys...
thanks!

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/f420f325-9f60-49d9-a9f6-d56528f32a99%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/f420f325-9f60-49d9-a9f6-d56528f32a99%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/9c312789-2492-4bdf-a660-19d542df4977%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/9c312789-2492-4bdf-a660-19d542df4977%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/F64B5203-F4A7-4FB0-B34C-44F8D3249D8E%40pilato.fr
https://groups.google.com/d/msgid/elasticsearch/F64B5203-F4A7-4FB0-B34C-44F8D3249D8E%40pilato.fr?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_dBySiL84TBMqYxiu2odL52oBsB2s2cW4VqeocFxLmnA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #6