More complete documentation and examples about the Delete By Query API?


(Aldian) #1

Hi

Yesterday we encountered hibernate bug
https://hibernate.atlassian.net/browse/HHH-3006, which provoked a huge load
of useless logstash traces (there was already 400.000 when we detected the
problem and enforced a more severe log level). So I tried to wipe out all
these useless record from elasticsearch. I reffered to the documentation

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-get.html
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-delete-by-query.html

but could not find an easy way to make a GET, check the results are what I
want to delete, then make a DELETE. If such functionality exists, please
add it to the docs. I ended querying this, which is the query made by
logstash to filter the results I want:
curl -XGET http://myserver:9200/_all/_search?pretty -d '{ "query": {
"filtered": { "query": { "bool": { "should":
[ { "query_string": { "query":
"
" } } ] } }, "filter":
{ "bool": { "must": [ { "fquery":
{ "query": { "query_string":
{ "query":
"idsession:("A7C571A26A606B210563EDBAF1AC7A37")"
} }, "_cache": true }
} ] } } } }}'*

Then I tried to use the same query to make a DELETE of the data, but got
several errors and followed the doc in order to have a valid call url. I
ended with this:

  • curl -XDELETE http://myserver:9200/logstash-2014.04.02?pretty -d '{
    "query": { "filtered": { "query": { "bool": {
    "should": [ { "query_string": {
    "query": "" } } ] } },
    "filter": { "bool": { "must": [ {
    "fquery": { "query": { "query_string":
    { "query":
    "idsession:("A7C571A26A606B210563EDBAF1AC7A37")"
    } }, "_cache": true }
    } ] } } } }}'

{

  • "acknowledged" : true*
    }

But the result was not as expected: I found out that elasticsearch had
purely ignored the filter and simply deleted all the data from that index.
I let you imagine my frustration when I realized that rather than spending
a lot of hours experimenting and trying to apply the docs, I could have
just made a rm -rf somewhere and got the same disappointing result in no
time.

So now that the big failure is done, I would like to know how I should have
done. There must be a way to test a query before actually sending the
delete, right?

Thanks for reading

Aldian

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8920d534-d09b-4867-b097-6938c17040ac%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(David Pilato) #2

Look at the doc: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-delete-by-query.html#docs-delete-by-query

Missing a _query at the end:

$ curl -XDELETE 'http://localhost:9200/twitter/tweet/_query?q=user:kimchy'

$ curl -XDELETE 'http://localhost:9200/twitter/tweet/_query' -d '{
"query" : {
"term" : { "user" : "kimchy" }
}
}
'

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 3 avril 2014 à 10:31:54, Aldian (aldian.gp@gmail.com) a écrit:

Hi

Yesterday we encountered hibernate bug https://hibernate.atlassian.net/browse/HHH-3006, which provoked a huge load of useless logstash traces (there was already 400.000 when we detected the problem and enforced a more severe log level). So I tried to wipe out all these useless record from elasticsearch. I reffered to the documentation

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-get.html
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-delete-by-query.html

but could not find an easy way to make a GET, check the results are what I want to delete, then make a DELETE. If such functionality exists, please add it to the docs. I ended querying this, which is the query made by logstash to filter the results I want:
curl -XGET http://myserver:9200/_all/_search?pretty -d '{ "query": { "filtered": { "query": { "bool": { "should": [ { "query_string": { "query": "*" } } ] } }, "filter": { "bool": { "must": [ { "fquery": { "query": { "query_string": { "query": "idsession:("A7C571A26A606B210563EDBAF1AC7A37")" } }, "_cache": true } } ] } } } }}'

Then I tried to use the same query to make a DELETE of the data, but got several errors and followed the doc in order to have a valid call url. I ended with this:
curl -XDELETE http://myserver:9200/logstash-2014.04.02?pretty -d '{ "query": { "filtered": { "query": { "bool": { "should": [ { "query_string": { "query": "*" } } ] } }, "filter": { "bool": { "must": [ { "fquery": { "query": { "query_string": { "query": "idsession:("A7C571A26A606B210563EDBAF1AC7A37")" } }, "_cache": true } } ] } } } }}'

{
"acknowledged" : true
}

But the result was not as expected: I found out that elasticsearch had purely ignored the filter and simply deleted all the data from that index. I let you imagine my frustration when I realized that rather than spending a lot of hours experimenting and trying to apply the docs, I could have just made a rm -rf somewhere and got the same disappointing result in no time.

So now that the big failure is done, I would like to know how I should have done. There must be a way to test a query before actually sending the delete, right?

Thanks for reading

Aldian

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8920d534-d09b-4867-b097-6938c17040ac%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.533d1de1.5092ca79.16bdd%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.


(Aldian) #3

That example is unclear because it is not clearly stated that "twitter" is
the name of an indice, and what does the "tweet" part stand for in the url.

Giving an example in the following form would have been much more clearer
IMHO:

curl -XDELETE 'http://localhost:9200/my_indice_name/_query

because when reading the example you don't know where are the mandatory
parts and what you have to adapt.
Same goes for the examples with url parameter ?q=user:kimchy I found no
further explanation about what to expect, but I tried with
?q=idsession:A7C571A26A606B210563EDBAF1AC7A and got nothing as expected,
I think the docs are useful memo for the ones already used to ES, but for
an outsider there are lot of things that are not as obvious as they would
seem.

Anyway I see your point. Thank you for the reply, I will keep that in mind
next time we have the problem.

Aldian

PS: BTW how do you insert code in google groups?

2014-04-03 10:37 GMT+02:00 David Pilato david@pilato.fr:

Look at the doc:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-delete-by-query.html#docs-delete-by-query

Missing a _query at the end:

$ curl -XDELETE 'http://localhost:9200/twitter/tweet/_query?q=user:kimchy'

$ curl -XDELETE 'http://localhost:9200/twitter/tweet/_query' -d '{
"query" : {
"term" : { "user" : "kimchy" }
}
}
'

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfrhttps://twitter.com/elasticsearchfr

Le 3 avril 2014 à 10:31:54, Aldian (aldian.gp@gmail.com) a écrit:

Hi

Yesterday we encountered hibernate bug
https://hibernate.atlassian.net/browse/HHH-3006, which provoked a huge
load of useless logstash traces (there was already 400.000 when we detected
the problem and enforced a more severe log level). So I tried to wipe out
all these useless record from elasticsearch. I reffered to the documentation

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-get.html

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-delete-by-query.html

but could not find an easy way to make a GET, check the results are what I
want to delete, then make a DELETE. If such functionality exists, please
add it to the docs. I ended querying this, which is the query made by
logstash to filter the results I want:
curl -XGET http://myserver:9200/_all/_search?pretty
http://myserver:9200/_all/_search?pretty -d '{ "query": { "filtered":
{ "query": { "bool": { "should": [
{ "query_string": { "query": "
"
} } ] } }, "filter": { "bool":
{ "must": [ { "fquery": {
"query": { "query_string": { "query":
"idsession:("A7C571A26A606B210563EDBAF1AC7A37")"
} }, "_cache": true }
} ] } } } }}'*

Then I tried to use the same query to make a DELETE of the data, but got
several errors and followed the doc in order to have a valid call url. I
ended with this:

{

  • "acknowledged" : true*
    }

But the result was not as expected: I found out that elasticsearch had
purely ignored the filter and simply deleted all the data from that index.
I let you imagine my frustration when I realized that rather than spending
a lot of hours experimenting and trying to apply the docs, I could have
just made a rm -rf somewhere and got the same disappointing result in no
time.

So now that the big failure is done, I would like to know how I should
have done. There must be a way to test a query before actually sending the
delete, right?

Thanks for reading

Aldian

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/8920d534-d09b-4867-b097-6938c17040ac%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/8920d534-d09b-4867-b097-6938c17040ac%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/Js6X5yxjAeM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/etPan.533d1de1.5092ca79.16bdd%40MacBook-Air-de-David.localhttps://groups.google.com/d/msgid/elasticsearch/etPan.533d1de1.5092ca79.16bdd%40MacBook-Air-de-David.local?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
Cordialement,

Aldian

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAECUaLyrJJ%2BkxSHKCX2Na%2BSGsO7s_KPpwdrvVvgg-YVkyni5jQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(David Pilato) #4

You can propose pull requests to enhance the documentation. We are really opened to that!
I have to say that you are the second one who did that mistake this week. So I guess it needs enhancement?

PS: BTW how do you insert code in google groups?

Copy and paste in my email client! :slight_smile:

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 3 avril 2014 à 11:07:30, Aldian (aldian.gp@gmail.com) a écrit:

That example is unclear because it is not clearly stated that "twitter" is the name of an indice, and what does the "tweet" part stand for in the url.

Giving an example in the following form would have been much more clearer IMHO:
curl -XDELETE 'http://localhost:9200/my_indice_name/_query
because when reading the example you don't know where are the mandatory parts and what you have to adapt.
Same goes for the examples with url parameter ?q=user:kimchy I found no further explanation about what to expect, but I tried with ?q=idsession:A7C571A26A606B210563EDBAF1AC7A and got nothing as expected, I think the docs are useful memo for the ones already used to ES, but for an outsider there are lot of things that are not as obvious as they would seem.

Anyway I see your point. Thank you for the reply, I will keep that in mind next time we have the problem.

Aldian

PS: BTW how do you insert code in google groups?

2014-04-03 10:37 GMT+02:00 David Pilato david@pilato.fr:
Look at the doc: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-delete-by-query.html#docs-delete-by-query

Missing a _query at the end:

$ curl -XDELETE 'http://localhost:9200/twitter/tweet/_query?q=user:kimchy'

$ curl -XDELETE 'http://localhost:9200/twitter/tweet/_query' -d '{
"query" : {
"term" : { "user" : "kimchy" }
}
}
'

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 3 avril 2014 à 10:31:54, Aldian (aldian.gp@gmail.com) a écrit:

Hi

Yesterday we encountered hibernate bug https://hibernate.atlassian.net/browse/HHH-3006, which provoked a huge load of useless logstash traces (there was already 400.000 when we detected the problem and enforced a more severe log level). So I tried to wipe out all these useless record from elasticsearch. I reffered to the documentation

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-get.html
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-delete-by-query.html

but could not find an easy way to make a GET, check the results are what I want to delete, then make a DELETE. If such functionality exists, please add it to the docs. I ended querying this, which is the query made by logstash to filter the results I want:
curl -XGET http://myserver:9200/_all/_search?pretty -d '{ "query": { "filtered": { "query": { "bool": { "should": [ { "query_string": { "query": "*" } } ] } }, "filter": { "bool": { "must": [ { "fquery": { "query": { "query_string": { "query": "idsession:("A7C571A26A606B210563EDBAF1AC7A37")" } }, "_cache": true } } ] } } } }}'

Then I tried to use the same query to make a DELETE of the data, but got several errors and followed the doc in order to have a valid call url. I ended with this:
curl -XDELETE http://myserver:9200/logstash-2014.04.02?pretty -d '{ "query": { "filtered": { "query": { "bool": { "should": [ { "query_string": { "query": "*" } } ] } }, "filter": { "bool": { "must": [ { "fquery": { "query": { "query_string": { "query": "idsession:("A7C571A26A606B210563EDBAF1AC7A37")" } }, "_cache": true } } ] } } } }}'

{
"acknowledged" : true
}

But the result was not as expected: I found out that elasticsearch had purely ignored the filter and simply deleted all the data from that index. I let you imagine my frustration when I realized that rather than spending a lot of hours experimenting and trying to apply the docs, I could have just made a rm -rf somewhere and got the same disappointing result in no time.

So now that the big failure is done, I would like to know how I should have done. There must be a way to test a query before actually sending the delete, right?

Thanks for reading

Aldian

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8920d534-d09b-4867-b097-6938c17040ac%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/Js6X5yxjAeM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.533d1de1.5092ca79.16bdd%40MacBook-Air-de-David.local.

For more options, visit https://groups.google.com/d/optout.

--
Cordialement,

Aldian

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAECUaLyrJJ%2BkxSHKCX2Na%2BSGsO7s_KPpwdrvVvgg-YVkyni5jQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.533d4401.1ca0c5fa.16bdd%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.


(system) #5