Delete by date range fails


(thealy) #1

Hello, relatively new to ES and struggling to delete old records from an
index("iron") , all of which are the same type ("email"). The idea is to
purge old data that is no longer of interest. In the examples below, I'm
trying to delete all the records from Septemeber of 2013. My use of ES is
primarily via the Java API, with little experience using curl from the
command line.

The date/time field is defined like
this: "tstamp":{"type":"date","format":"dateOptionalTime"}

I have been trying to work from assorted 'delete by query' docs and forum
suggestions, but so far the result is a puzzling set of failures. I'm sure
that I'm messing something up on a very basic level since this must be a
common operation.

This works:

curl -XGET 'http://192.168.4.73:9200/iron/email/_search?pretty=true' -d
'{"query":{"range":{"tstamp":{"from":"2013-09-01T00:00:00","to":"2013-09-30T23:59:59"}}}}'

This FAILS:

curl -XGET 'http://192.168.4.73:9200/iron/email/_query' -d
'{"query":{"range":{"tstamp":{"from":"2013-09-01T00:00:00","to":"2013-09-02T23:59:59"}}}}'

Returning: {"_index":"iron","_type":"email","_id":"_query","exists":false}

So of course the -XDELETE based on this query also fails.

Any suggestions appreciated.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/087de0dd-4e88-40f9-8606-91c6b8aaada3%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Adrien Grand) #2

Hi,

If you manage time-based data, the most efficient way to handle deletes
would be to have time-based indices. For example if your retention policy
is to keep data for the last 6 months, you could use one index per month
and delete the index that is 7 months old every month. Since Elasticsearch
supports searching across indices with no overhead (searching over n
indices which have m shards on average is exactly the same as searching
over m indices that have n shards), you would just have to specify the
indices names for the last 6 months (or use an alias
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-aliases.html
).

Regarding your issue, "GET [...] /_query" is not supported and the error
message that you got is due to the fact that Elasticsearch interpreted it
as a call to the GET API (it assumes _query is the document ID and ignores
the body).

However, -XDELETE should work. I just tried to reproduce the issue locally
but did not manage to. Are you using Elasticsearch 1.0? I'm assuming so
because in Elasticsearch 0.90, you should NOT enclose the query into a
top-level "query" JSON object, so this might be the cause of your problem
if you are using Elasticsearch 0.90.

On Wed, Feb 19, 2014 at 7:23 PM, Terry Healy thealy@bnl.gov wrote:

Hello, relatively new to ES and struggling to delete old records from an
index("iron") , all of which are the same type ("email"). The idea is to
purge old data that is no longer of interest. In the examples below, I'm
trying to delete all the records from Septemeber of 2013. My use of ES is
primarily via the Java API, with little experience using curl from the
command line.

The date/time field is defined like
this: "tstamp":{"type":"date","format":"dateOptionalTime"}

I have been trying to work from assorted 'delete by query' docs and forum
suggestions, but so far the result is a puzzling set of failures. I'm sure
that I'm messing something up on a very basic level since this must be a
common operation.

This works:

curl -XGET 'http://192.168.4.73:9200/iron/email/_search?pretty=true' -d
'{"query":{"range":{"tstamp":{"from":"2013-09-01T00:00:00","to":"2013-09-30T23:59:59"}}}}'

This FAILS:

curl -XGET 'http://192.168.4.73:9200/iron/email/_query' -d
'{"query":{"range":{"tstamp":{"from":"2013-09-01T00:00:00","to":"2013-09-02T23:59:59"}}}}'

Returning: {"_index":"iron","_type":"email","_id":"_query","exists":false}

So of course the -XDELETE based on this query also fails.

Any suggestions appreciated.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/087de0dd-4e88-40f9-8606-91c6b8aaada3%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j7R%2BDkgSkb6RG-gp4NagKOWjnepZ4TYx%3Dz%2Bq8ovLDbynw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(thealy) #3

Yes, I am still running 0.90. Sorry, but I find the interaction via curl
with complex json to be be awkward; I lost a lot of time experimenting
with possible syntax by cutting and pasting trials, only to find that
linefeeds and tabs in my formatted text were causing it to fail.

I will upgrade to 1.0 ASAP and attempt to recreate my indices with ttl
set. I have to investigate how to create the time-based indices, since I
used Kibana to create what I have, and the Java api for everything else.
Unfortunately I will lose several months data, but I can live with this.

Just for the sake of my sanity, can you tell me what is the correct
-XDELETE command format for my example under 0.90?

I tried to remove the query wrapper with:

curl -XDELETE 'http://192.168.4.73:9200/iron/email/_query' -d
'{"range":{"tstamp":{"from":"2013-09-01T00:00:00","to":"2013-09-02T23:59:59"}}}'

Result:
{"ok":true,"_indices":{"iron":{"_shards":{"total":5,"successful":5,"failed":0}}}}

But nothing appears to have been removed.

Thanks for your help and patience. I will try to upgrade everything to
1.0 today and start over.

On 02/19/2014 05:44 PM, Adrien Grand wrote:

Hi,

If you manage time-based data, the most efficient way to handle deletes
would be to have time-based indices. For example if your retention
policy is to keep data for the last 6 months, you could use one index
per month and delete the index that is 7 months old every month. Since
Elasticsearch supports searching across indices with no overhead
(searching over n indices which have m shards on average is exactly the
same as searching over m indices that have n shards), you would just
have to specify the indices names for the last 6 months (or use an
alias http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-aliases.html).

Regarding your issue, "GET [...] /_query" is not supported and the error
message that you got is due to the fact that Elasticsearch interpreted
it as a call to the GET API (it assumes _query is the document ID and
ignores the body).

However, -XDELETE should work. I just tried to reproduce the issue
locally but did not manage to. Are you using Elasticsearch 1.0? I'm
assuming so because in Elasticsearch 0.90, you should NOT enclose the
query into a top-level "query" JSON object, so this might be the cause
of your problem if you are using Elasticsearch 0.90.

On Wed, Feb 19, 2014 at 7:23 PM, Terry Healy <thealy@bnl.gov
mailto:thealy@bnl.gov> wrote:

Hello, relatively new to ES and struggling to delete old records
from an index("iron") , all of which are the same type ("email").
The idea is to purge old data that is no longer of interest.  In the
examples below, I'm trying to delete all the records from Septemeber
of 2013. My use of ES is primarily via the Java API, with little
experience using curl from the command line.

The date/time field is defined like
this: "tstamp":{"type":"date","format":"dateOptionalTime"}

I have been trying to work from assorted 'delete by query' docs and
forum suggestions, but so far the result is a puzzling set of
failures. I'm sure that I'm messing something up on a very basic
level since this must be a common operation.

This works:

curl -XGET 'http://192.168.4.73:9200/iron/email/_search?pretty=true'
-d
'{"query":{"range":{"tstamp":{"from":"2013-09-01T00:00:00","to":"2013-09-30T23:59:59"}}}}'

This FAILS:

curl -XGET 'http://192.168.4.73:9200/iron/email/_query' -d
'{"query":{"range":{"tstamp":{"from":"2013-09-01T00:00:00","to":"2013-09-02T23:59:59"}}}}'

Returning: {"_index":"iron","_type":"email","_id":"_query","exists":false}

So of course the -XDELETE based on this query also fails.


Any suggestions appreciated.

-- 
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearch+unsubscribe@googlegroups.com
<mailto:elasticsearch%2Bunsubscribe@googlegroups.com>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/087de0dd-4e88-40f9-8606-91c6b8aaada3%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Adrien Grand

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/Xl6U_HHf1MQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j7R%2BDkgSkb6RG-gp4NagKOWjnepZ4TYx%3Dz%2Bq8ovLDbynw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/53060E0F.2060308%40bnl.gov.
For more options, visit https://groups.google.com/groups/opt_out.


(Tony Su) #4

Hi Terry,
Just a comment on your "linefeeds and tabs" problem...

My guess is that you're working on Windows. Plenty of text editors on
Windows insert invisible characters that can screw things up.

But, I've found that the opposite is true in the *NIX world, few text
editors do that sort of thing.

So, the general recommendation probably would be to find a text editor that
doesn't do that sort of thing or change OS where your choices will more
likely work.

I've also found that JSON formatted statements can be executed in some
console environments, like *NIX bash(maybe not Windows console, I haven't
tried), so strictly speaking linefeeds and tabs aren't always an issue.
But, if you need to remove those linefeeds and whitespaces, there are a ton
of freeware and online tools to remove those (I use them to minimize HTML
pages and javascript)... so you can have the best of both worlds, compose
in JSON, then remove whitespace for execution.

Re: indexing based on timestamp, you might take a look at the Logstash
apache log parser, it's done automatically by default an index is created
based on each day.

HTH,
Tony

On Thursday, February 20, 2014 6:15:43 AM UTC-8, Terry Healy wrote:

Yes, I am still running 0.90. Sorry, but I find the interaction via curl
with complex json to be be awkward; I lost a lot of time experimenting
with possible syntax by cutting and pasting trials, only to find that
linefeeds and tabs in my formatted text were causing it to fail.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/cb9134c1-8d7d-4987-9f69-f7ebd37b4a99%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Binh Ly) #5

I'd also check that you actually have data matching your delete criteria.
For example:

curl -XPOST 'http://192.168.4.73:9200/iron/email/_searchhttp://192.168.4.73:9200/iron/email/_query'
-d '{"query":
{"range":{"tstamp":{"from":"2013-09-01T00:00:00","to":"2013-09-02T23:59:59"}}}}'

Should return something.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a9c68721-bae3-4f43-bb4e-7c514275f7fd%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(thealy) #6

The XDELETE that I last sent was in fact working, after removing the
query wrapper for version 1.0. I just was expecting an immediate size
reduction in Kibana, which showed up several minutes later when I
deleted a full month's data.

Thanks.

On 02/20/2014 11:12 AM, Binh Ly wrote:

I'd also check that you actually have data matching your delete
criteria. For example:

curl -XPOST 'http://192.168.4.73:9200/iron/email/_search
http://192.168.4.73:9200/iron/email/_query' -d '{"query":
{"range":{"tstamp":{"from":"2013-09-01T00:00:00","to":"2013-09-02T23:59:59"}}}}'

Should return something.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/Xl6U_HHf1MQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a9c68721-bae3-4f43-bb4e-7c514275f7fd%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/53063051.3020300%40bnl.gov.
For more options, visit https://groups.google.com/groups/opt_out.


(thealy) #7

Thanks Tony.

I was using gedit on a Linux box. Since I got my delete to work, I'm
hoping to upgrade to Version 1.0 and then avoid using curl once my
indices are created. I have to keep some proof of concept apps running
so I'm trying to only break one thing at a time....

I just started feeding from Logstash, which is very nice, but I'm doing
intermediate processing on the records before inserting them into
Elasticsearch using the Java API.

Thanks for your help.

On 02/20/2014 10:49 AM, Tony Su wrote:

Hi Terry,
Just a comment on your "linefeeds and tabs" problem...

My guess is that you're working on Windows. Plenty of text editors on
Windows insert invisible characters that can screw things up.

But, I've found that the opposite is true in the *NIX world, few text
editors do that sort of thing.

So, the general recommendation probably would be to find a text editor
that doesn't do that sort of thing or change OS where your choices will
more likely work.

I've also found that JSON formatted statements can be executed in some
console environments, like *NIX bash(maybe not Windows console, I
haven't tried), so strictly speaking linefeeds and tabs aren't always an
issue. But, if you need to remove those linefeeds and whitespaces, there
are a ton of freeware and online tools to remove those (I use them to
minimize HTML pages and javascript)... so you can have the best of both
worlds, compose in JSON, then remove whitespace for execution.

Re: indexing based on timestamp, you might take a look at the Logstash
apache log parser, it's done automatically by default an index is
created based on each day.

HTH,
Tony

On Thursday, February 20, 2014 6:15:43 AM UTC-8, Terry Healy wrote:

Yes, I am still running 0.90. Sorry, but I find the interaction via
curl
with complex json to be be awkward; I lost a lot of time experimenting
with possible syntax by cutting and pasting trials, only to find that
linefeeds and tabs in my formatted text were causing it to fail.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/Xl6U_HHf1MQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/cb9134c1-8d7d-4987-9f69-f7ebd37b4a99%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/53063F7E.9000606%40bnl.gov.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #8