How to clear data out of an index


(danpolites) #1

How do you clear the data out of an index without deleting that index?
We are using elasticsearch in a Grails application. During development
when we end up with rogue objects in the index, we can't remove them
because they no longer exist in the database. We added an admin
function that allows us to delete and recreate the index, but we don't
need to delete the index entirely. We only need to clear it.


(Clinton Gormley) #2

On Thu, 2011-07-28 at 12:51 -0700, danpolites wrote:

How do you clear the data out of an index without deleting that index?
We are using elasticsearch in a Grails application. During development
when we end up with rogue objects in the index, we can't remove them
because they no longer exist in the database. We added an admin
function that allows us to delete and recreate the index, but we don't
need to delete the index entirely. We only need to clear it.

Use the delete or delete_by_query API

clint


(Vladimir Shkurin) #3

Have you seen this? :slight_smile:

http://www.elasticsearch.org/guide/reference/api/delete-by-query.html

http://www.elasticsearch.org/guide/reference/api/delete.html


(danpolites) #4

Yes, I have seen those many times ;). Maybe the way I am doing it is
the best way to do it. I was looking for a more convenient API call
that would just clear all of the documents in the index without
deleting the index so that I wouldn't have to setup the index again. I
suppose a 'delete by query' would work for that, but I'm hesitant to
use that with the warning that is provided on the page:

"Also, it is not recommended to delete “large chunks of the data in an
index”, many times, its better to simply reindex into a new index."

Why is this not recommended? We can't reindex to a new index because
we are creating a new index for every one of our customers and the
index names are unique to the customer. All of our objects have a
customer ID field that our search service uses choosing the correct
index. Again, this is idea of deleting and creating an index to clear
it out is mainly a convenience thing for development.Hopefully we
don't have rogue documents in production, but if we do, we need to be
able to quickly clean the indices because we use elasticsearch results
for all of our list views in the site and the show pages are MongoDB
calls. Our current solution of deleting the index and then creating
that index works OK, but it's really slow at reindexing with large
amounts of data. Maybe this is more of a best practice sort of
question. How should this be handled?

On Jul 28, 5:02 pm, Vladimir Shkurin vshku...@gmail.com wrote:

Have you seen this? :slight_smile:

http://www.elasticsearch.org/guide/reference/api/delete-by-query.html

http://www.elasticsearch.org/guide/reference/api/delete.html


(David Pilato) #5

I think that you should use one index per customer and create a main alias index on top.

When you need to clean a customer index, then drop it and create it again and add it to the alias.

It will take only some milliseconds. Deleting documents one by one (even with a query) will cost you so much and as you said, you want to remove "quickly" some customer's datas.

Hope this helps
David

Le 29 juil. 2011 à 03:16, danpolites dpolites@gmail.com a écrit :

Yes, I have seen those many times ;). Maybe the way I am doing it is
the best way to do it. I was looking for a more convenient API call
that would just clear all of the documents in the index without
deleting the index so that I wouldn't have to setup the index again. I
suppose a 'delete by query' would work for that, but I'm hesitant to
use that with the warning that is provided on the page:

"Also, it is not recommended to delete “large chunks of the data in an
index”, many times, its better to simply reindex into a new index."

Why is this not recommended? We can't reindex to a new index because
we are creating a new index for every one of our customers and the
index names are unique to the customer. All of our objects have a
customer ID field that our search service uses choosing the correct
index. Again, this is idea of deleting and creating an index to clear
it out is mainly a convenience thing for development.Hopefully we
don't have rogue documents in production, but if we do, we need to be
able to quickly clean the indices because we use elasticsearch results
for all of our list views in the site and the show pages are MongoDB
calls. Our current solution of deleting the index and then creating
that index works OK, but it's really slow at reindexing with large
amounts of data. Maybe this is more of a best practice sort of
question. How should this be handled?

On Jul 28, 5:02 pm, Vladimir Shkurin vshku...@gmail.com wrote:

Have you seen this? :slight_smile:

http://www.elasticsearch.org/guide/reference/api/delete-by-query.html

http://www.elasticsearch.org/guide/reference/api/delete.html


(Shay Banon) #6

The reason for the warning is the fact that when you delete a large part of
your index, in practice, those documents don't get deleted but only marked
as deleted in the lucene index. They will eventually get merged out of the
index (to clean space and optimize the index).

So, if one ends up deleting a large portion of the index, it sometimes makes
sense to actually reindex the data, as it will create a more optimized
index. It really depends on the usecase of course.

On Fri, Jul 29, 2011 at 4:16 AM, danpolites dpolites@gmail.com wrote:

Yes, I have seen those many times ;). Maybe the way I am doing it is
the best way to do it. I was looking for a more convenient API call
that would just clear all of the documents in the index without
deleting the index so that I wouldn't have to setup the index again. I
suppose a 'delete by query' would work for that, but I'm hesitant to
use that with the warning that is provided on the page:

"Also, it is not recommended to delete “large chunks of the data in an
index”, many times, its better to simply reindex into a new index."

Why is this not recommended? We can't reindex to a new index because
we are creating a new index for every one of our customers and the
index names are unique to the customer. All of our objects have a
customer ID field that our search service uses choosing the correct
index. Again, this is idea of deleting and creating an index to clear
it out is mainly a convenience thing for development.Hopefully we
don't have rogue documents in production, but if we do, we need to be
able to quickly clean the indices because we use elasticsearch results
for all of our list views in the site and the show pages are MongoDB
calls. Our current solution of deleting the index and then creating
that index works OK, but it's really slow at reindexing with large
amounts of data. Maybe this is more of a best practice sort of
question. How should this be handled?

On Jul 28, 5:02 pm, Vladimir Shkurin vshku...@gmail.com wrote:

Have you seen this? :slight_smile:

http://www.elasticsearch.org/guide/reference/api/delete-by-query.html

http://www.elasticsearch.org/guide/reference/api/delete.html


(Michael Sokolov) #7

This is an old thread, but I have a variation on the same question: I've
seen various recommendations to drop and recreate indexes rather than
deleting all documents. I just want to know if there is anything in
elasticsearch that maps to IndexWriter.deleteAll, since that is an
efficient way to empty an index without having to recreate it. It would be
convenient if say deleteByQuery(":") were to cause that: does it?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e55f492c-b027-4a05-9546-dc2b7f7d468d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Itamar Syn-Hershko) #8

Due to the distributed nature of ES, the equivalent of this would be to
delete the index and create it again using the same mapping.

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Sun, Apr 6, 2014 at 4:00 AM, Michael Sokolov msokolov@gmail.com wrote:

This is an old thread, but I have a variation on the same question: I've
seen various recommendations to drop and recreate indexes rather than
deleting all documents. I just want to know if there is anything in
elasticsearch that maps to IndexWriter.deleteAll, since that is an
efficient way to empty an index without having to recreate it. It would be
convenient if say deleteByQuery(":") were to cause that: does it?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/e55f492c-b027-4a05-9546-dc2b7f7d468d%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/e55f492c-b027-4a05-9546-dc2b7f7d468d%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHTr4Zt2T%2BpMOyn843Fgffa0LRK_%2BvfJwjkXtcY9V9pcszw-sg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #9