Query from a date (> than date)


(electic) #1

Hi,

I am a bit new to this and I could not find anything in the
documentation but I would like to use the URL Get API to query
documents from a certain date. For example, everything above
2011-04-22 12:00AM. I have a created_at field that has the date and
time. What would be the best way to do this? Here is my document for
reference:

{
"_index": "documents",
"_type": "document",
"_id": "MRc8EJE5SRK0TXKk-nnpWg",
"_score": 1.581601,
"_source": {
"created_at": "2011-04-19T16:22:13",
"platform": "twitter",
"record_id": "Test:1244",
"tags": "Princess Cruises ships to resume calls in Egypt .
http://usat.ly/if6AaO hottraveldeals2 Hot Travel Deals",
"utimestamp": "1303202398000"
}

Lastly, I had a question on the IDs. They seem to be alphanumeric, is
there a way to make them longs?

Regards,

-R


(David Pilato) #2

I think you will find the right way to do it on the elastic search home page. Look at the search part.

You will find a simple way to use range filter.

Hope this helps...

Envoyé avec mon iPhone 4 :wink:

Le 21 avr. 2011 à 04:08, electic electic@gmail.com a écrit :

Hi,

I am a bit new to this and I could not find anything in the
documentation but I would like to use the URL Get API to query
documents from a certain date. For example, everything above
2011-04-22 12:00AM. I have a created_at field that has the date and
time. What would be the best way to do this? Here is my document for
reference:

{
"_index": "documents",
"_type": "document",
"_id": "MRc8EJE5SRK0TXKk-nnpWg",
"_score": 1.581601,
"_source": {
"created_at": "2011-04-19T16:22:13",
"platform": "twitter",
"record_id": "Test:1244",
"tags": "Princess Cruises ships to resume calls in Egypt .
http://usat.ly/if6AaO hottraveldeals2 Hot Travel Deals",
"utimestamp": "1303202398000"
}

Lastly, I had a question on the IDs. They seem to be alphanumeric, is
there a way to make them longs?

Regards,

-R


(Lukáš Vlček) #3

Hi,

you can use Range Query support from Lucene syntax
http://lucene.apache.org/java/3_1_0/queryparsersyntax.html#Range
Searches

<http://lucene.apache.org/java/3_1_0/queryparsersyntax.html#Range Searches>That
means something like curl -XGET
'host:port/_search?q=*+created_at:[2010-10-01+TO+3000-01-01]'
Note the arbitrary upper bound date value, I do not think Lucene range query
allows to omit lower or upper bound from the interval.

But if you can use POST requests or your client supports body content for
GET requests then I would recommend you to take a look at
http://www.elasticsearch.org/guide/reference/query-dsl/range-filter.html or
http://www.elasticsearch.org/guide/reference/query-dsl/range-query.html

Regards,
Lukas

On Thu, Apr 21, 2011 at 4:08 AM, electic electic@gmail.com wrote:

Hi,

I am a bit new to this and I could not find anything in the
documentation but I would like to use the URL Get API to query
documents from a certain date. For example, everything above
2011-04-22 12:00AM. I have a created_at field that has the date and
time. What would be the best way to do this? Here is my document for
reference:

{
"_index": "documents",
"_type": "document",
"_id": "MRc8EJE5SRK0TXKk-nnpWg",
"_score": 1.581601,
"_source": {
"created_at": "2011-04-19T16:22:13",
"platform": "twitter",
"record_id": "Test:1244",
"tags": "Princess Cruises ships to resume calls in Egypt .
http://usat.ly/if6AaO hottraveldeals2 Hot Travel Deals",
"utimestamp": "1303202398000"
}

Lastly, I had a question on the IDs. They seem to be alphanumeric, is
there a way to make them longs?

Regards,

-R


(Shay Banon) #4

Heya,

Using the q parameter, you can place a * value to denote unbounded lower / upper search.

You can also provide the whole body supported in a source query string parameter.

-shay.banon
On Thursday, April 21, 2011 at 10:55 AM, Lukáš Vlček wrote:

Hi,

you can use Range Query support from Lucene syntax http://lucene.apache.org/java/3_1_0/queryparsersyntax.html#Range Searches

That means something like curl -XGET 'host:port/_search?q=*+created_at:[2010-10-01+TO+3000-01-01]'
Note the arbitrary upper bound date value, I do not think Lucene range query allows to omit lower or upper bound from the interval.

But if you can use POST requests or your client supports body content for GET requests then I would recommend you to take a look at http://www.elasticsearch.org/guide/reference/query-dsl/range-filter.html or http://www.elasticsearch.org/guide/reference/query-dsl/range-query.html

Regards,
Lukas

On Thu, Apr 21, 2011 at 4:08 AM, electic electic@gmail.com wrote:

Hi,

I am a bit new to this and I could not find anything in the
documentation but I would like to use the URL Get API to query
documents from a certain date. For example, everything above
2011-04-22 12:00AM. I have a created_at field that has the date and
time. What would be the best way to do this? Here is my document for
reference:

{
"_index": "documents",
"_type": "document",
"_id": "MRc8EJE5SRK0TXKk-nnpWg",
"_score": 1.581601,
"_source": {
"created_at": "2011-04-19T16:22:13",
"platform": "twitter",
"record_id": "Test:1244",
"tags": "Princess Cruises ships to resume calls in Egypt .
http://usat.ly/if6AaO hottraveldeals2 Hot Travel Deals",
"utimestamp": "1303202398000"
}

Lastly, I had a question on the IDs. They seem to be alphanumeric, is
there a way to make them longs?

Regards,

-R


(electic) #5

Thanks guys! I think I am just going to go ahead and use the POST part
of the API instead of trying to pass the whole thing via the URL. Made
life easier. On an un-related note. We are seriously looking at adding
10 more servers to the cluster to see how easy that is. Currently,
each server is a quad core xeon with 16GB of RAM and a 600GB SAS disk.
At this time, we have 5 shards. My understanding is that 5 shards = 5
servers. Now let's say I had 10 nodes and we have 15 servers. How do
we best migrate the index or modify the index to take advantage of
those 10 nodes?

On Apr 21, 3:23 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Heya,

Using the q parameter, you can place a * value to denote unbounded lower / upper search.

You can also provide the whole body supported in a source query string parameter.

-shay.banon

On Thursday, April 21, 2011 at 10:55 AM, Lukáš Vlček wrote:

Hi,

you can use Range Query support from Lucene syntaxhttp://lucene.apache.org/java/3_1_0/queryparsersyntax.html#RangeSearches

That means something like curl -XGET 'host:port/_search?q=*+created_at:[2010-10-01+TO+3000-01-01]'
Note the arbitrary upper bound date value, I do not think Lucene range query allows to omit lower or upper bound from the interval.

But if you can use POST requests or your client supports body content for GET requests then I would recommend you to take a look athttp://www.elasticsearch.org/guide/reference/query-dsl/range-filter.htmlorhttp://www.elasticsearch.org/guide/reference/query-dsl/range-query.html

Regards,
Lukas

On Thu, Apr 21, 2011 at 4:08 AM, electic elec...@gmail.com wrote:

Hi,

I am a bit new to this and I could not find anything in the
documentation but I would like to use the URL Get API to query
documents from a certain date. For example, everything above
2011-04-22 12:00AM. I have a created_at field that has the date and
time. What would be the best way to do this? Here is my document for
reference:

{
"_index": "documents",
"_type": "document",
"_id": "MRc8EJE5SRK0TXKk-nnpWg",
"_score": 1.581601,
"_source": {
"created_at": "2011-04-19T16:22:13",
"platform": "twitter",
"record_id": "Test:1244",
"tags": "Princess Cruises ships to resume calls in Egypt .
http://usat.ly/if6AaOhottraveldeals2 Hot Travel Deals",
"utimestamp": "1303202398000"
}

Lastly, I had a question on the IDs. They seem to be alphanumeric, is
there a way to make them longs?

Regards,

-R


(Clinton Gormley) #6

On Thu, 2011-04-21 at 08:22 -0700, electic wrote:

Thanks guys! I think I am just going to go ahead and use the POST part
of the API instead of trying to pass the whole thing via the URL. Made
life easier. On an un-related note. We are seriously looking at adding
10 more servers to the cluster to see how easy that is. Currently,
each server is a quad core xeon with 16GB of RAM and a 600GB SAS disk.
At this time, we have 5 shards. My understanding is that 5 shards = 5
servers. Now let's say I had 10 nodes and we have 15 servers. How do
we best migrate the index or modify the index to take advantage of
those 10 nodes?

You have primary shards and replicas. By default, you'd have 5 primary
shards and 1 replica (ie 1 replica for each primary shard)

So if you start 10 nodes, then you would have one shard on each (either
a primary or a replica).

You can dynamically increase your number of replicas, so if you set
replicas to 2, then you'd have enough shards to fill 15 nodes with one
shard each.

If you want to (and need to) increase the number of primary shards, then
you will need to reindex to a new index that has been created with a
higher number of primary shards.

clint


(electic) #7

Sounds great. One last question. I noticed the IDs are alphanumeric:

"_id": "MRc8EJE5SRK0TXKk-nnpWg",

is there a way to make them long integers?

On Apr 21, 8:37 am, Clinton Gormley clin...@iannounce.co.uk wrote:

On Thu, 2011-04-21 at 08:22 -0700, electic wrote:

Thanks guys! I think I am just going to go ahead and use the POST part
of the API instead of trying to pass the whole thing via the URL. Made
life easier. On an un-related note. We are seriously looking at adding
10 more servers to the cluster to see how easy that is. Currently,
each server is a quad core xeon with 16GB of RAM and a 600GB SAS disk.
At this time, we have 5 shards. My understanding is that 5 shards = 5
servers. Now let's say I had 10 nodes and we have 15 servers. How do
we best migrate the index or modify the index to take advantage of
those 10 nodes?

You have primary shards and replicas. By default, you'd have 5 primary
shards and 1 replica (ie 1 replica for each primary shard)

So if you start 10 nodes, then you would have one shard on each (either
a primary or a replica).

You can dynamically increase your number of replicas, so if you set
replicas to 2, then you'd have enough shards to fill 15 nodes with one
shard each.

If you want to (and need to) increase the number of primary shards, then
you will need to reindex to a new index that has been created with a
higher number of primary shards.

clint


(Shay Banon) #8

This is the auto generate id elasticsearch generates. Its a UUID that has been base64. A long value is much more problematic to do in distributed systems, but, you can provide one (the id) if you want.
On Friday, April 22, 2011 at 12:41 AM, electic wrote:

Sounds great. One last question. I noticed the IDs are alphanumeric:

"_id": "MRc8EJE5SRK0TXKk-nnpWg",

is there a way to make them long integers?

On Apr 21, 8:37 am, Clinton Gormley clin...@iannounce.co.uk wrote:

On Thu, 2011-04-21 at 08:22 -0700, electic wrote:

Thanks guys! I think I am just going to go ahead and use the POST part
of the API instead of trying to pass the whole thing via the URL. Made
life easier. On an un-related note. We are seriously looking at adding
10 more servers to the cluster to see how easy that is. Currently,
each server is a quad core xeon with 16GB of RAM and a 600GB SAS disk.
At this time, we have 5 shards. My understanding is that 5 shards = 5
servers. Now let's say I had 10 nodes and we have 15 servers. How do
we best migrate the index or modify the index to take advantage of
those 10 nodes?

You have primary shards and replicas. By default, you'd have 5 primary
shards and 1 replica (ie 1 replica for each primary shard)

So if you start 10 nodes, then you would have one shard on each (either
a primary or a replica).

You can dynamically increase your number of replicas, so if you set
replicas to 2, then you'd have enough shards to fill 15 nodes with one
shard each.

If you want to (and need to) increase the number of primary shards, then
you will need to reindex to a new index that has been created with a
higher number of primary shards.

clint


(electic) #9

Shay, you are going to kill me but I have one more question. It has to
do with uniqueness. Our documents kind of look like this:

{
"_index": "documents",
"_type": "document",
"_id": "MRc8EJE5SRK0TXKk-nnpWg",
"_score": 1.581601,
"_source": {
"created_at": "2011-04-19T16:22:13",
"platform": "twitter",
"record_id": "Test:1244",
"tags": "Princess Cruises ships to resume calls in Egypt .
http://usat.ly/if6AaO hottraveldeals2 Hot Travel Deals",
"utimestamp": "1303202398000"
}

And what if I wanted to make sure the record_id is unique to prevent
dupes in there? Is there a way to do that?

-R

On Apr 21, 3:52 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

This is the auto generate id elasticsearch generates. Its a UUID that has been base64. A long value is much more problematic to do in distributed systems, but, you can provide one (the id) if you want.

On Friday, April 22, 2011 at 12:41 AM, electic wrote:

Sounds great. One last question. I noticed the IDs are alphanumeric:

"_id": "MRc8EJE5SRK0TXKk-nnpWg",

is there a way to make them long integers?

On Apr 21, 8:37 am, Clinton Gormley clin...@iannounce.co.uk wrote:

On Thu, 2011-04-21 at 08:22 -0700, electic wrote:

Thanks guys! I think I am just going to go ahead and use the POST part
of the API instead of trying to pass the whole thing via the URL. Made
life easier. On an un-related note. We are seriously looking at adding
10 more servers to the cluster to see how easy that is. Currently,
each server is a quad core xeon with 16GB of RAM and a 600GB SAS disk.
At this time, we have 5 shards. My understanding is that 5 shards = 5
servers. Now let's say I had 10 nodes and we have 15 servers. How do
we best migrate the index or modify the index to take advantage of
those 10 nodes?

You have primary shards and replicas. By default, you'd have 5 primary
shards and 1 replica (ie 1 replica for each primary shard)

So if you start 10 nodes, then you would have one shard on each (either
a primary or a replica).

You can dynamically increase your number of replicas, so if you set
replicas to 2, then you'd have enough shards to fill 15 nodes with one
shard each.

If you want to (and need to) increase the number of primary shards, then
you will need to reindex to a new index that has been created with a
higher number of primary shards.

clint


(Shay Banon) #10

There isn't a way to do it, but maybe you should use it as the value of the document id when indexing?
On Friday, April 22, 2011 at 4:05 AM, electic wrote:

Shay, you are going to kill me but I have one more question. It has to
do with uniqueness. Our documents kind of look like this:

{
"_index": "documents",
"_type": "document",
"_id": "MRc8EJE5SRK0TXKk-nnpWg",
"_score": 1.581601,
"_source": {
"created_at": "2011-04-19T16:22:13",
"platform": "twitter",
"record_id": "Test:1244",
"tags": "Princess Cruises ships to resume calls in Egypt .
http://usat.ly/if6AaO hottraveldeals2 Hot Travel Deals",
"utimestamp": "1303202398000"
}

And what if I wanted to make sure the record_id is unique to prevent
dupes in there? Is there a way to do that?

-R

On Apr 21, 3:52 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

This is the auto generate id elasticsearch generates. Its a UUID that has been base64. A long value is much more problematic to do in distributed systems, but, you can provide one (the id) if you want.

On Friday, April 22, 2011 at 12:41 AM, electic wrote:

Sounds great. One last question. I noticed the IDs are alphanumeric:

"_id": "MRc8EJE5SRK0TXKk-nnpWg",

is there a way to make them long integers?

On Apr 21, 8:37 am, Clinton Gormley clin...@iannounce.co.uk wrote:

On Thu, 2011-04-21 at 08:22 -0700, electic wrote:

Thanks guys! I think I am just going to go ahead and use the POST part
of the API instead of trying to pass the whole thing via the URL. Made
life easier. On an un-related note. We are seriously looking at adding
10 more servers to the cluster to see how easy that is. Currently,
each server is a quad core xeon with 16GB of RAM and a 600GB SAS disk.
At this time, we have 5 shards. My understanding is that 5 shards = 5
servers. Now let's say I had 10 nodes and we have 15 servers. How do
we best migrate the index or modify the index to take advantage of
those 10 nodes?

You have primary shards and replicas. By default, you'd have 5 primary
shards and 1 replica (ie 1 replica for each primary shard)

So if you start 10 nodes, then you would have one shard on each (either
a primary or a replica).

You can dynamically increase your number of replicas, so if you set
replicas to 2, then you'd have enough shards to fill 15 nodes with one
shard each.

If you want to (and need to) increase the number of primary shards, then
you will need to reindex to a new index that has been created with a
higher number of primary shards.

clint


(electic) #11

Just to clairfy, you are saying that I should make it the _id and ES
will make sure there are no two documetns with the same id? So ES will
reject the second dupe?


(electic) #12

Okay, I tested it. It seems to work as you said. What about versioning
though? Let's say you are inserting a dupe now and you want ES to just
ignore it. I looked at the doc and it seems it just updates the
version number and stores the new doc. How would you get it to ignore
the insert all together. Is the a curl example of that?

All our calls are inserts, I wouldn't have the IDs available. Just
want the insert to be ignored if the _id is already in the system.

On Apr 22, 1:42 pm, electic elec...@gmail.com wrote:

Just to clairfy, you are saying that I should make it the _id and ES
will make sure there are no two documetns with the same id? So ES will
reject the second dupe?


(David Williams) #13

it's covered here: http://www.elasticsearch.org/blog/2011/02/08/versioning.html
(look at "put if absent")

-david

On Fri, Apr 22, 2011 at 3:54 PM, electic electic@gmail.com wrote:

Okay, I tested it. It seems to work as you said. What about versioning
though? Let's say you are inserting a dupe now and you want ES to just
ignore it. I looked at the doc and it seems it just updates the
version number and stores the new doc. How would you get it to ignore
the insert all together. Is the a curl example of that?

All our calls are inserts, I wouldn't have the IDs available. Just
want the insert to be ignored if the _id is already in the system.

On Apr 22, 1:42 pm, electic elec...@gmail.com wrote:

Just to clairfy, you are saying that I should make it the _id and ES
will make sure there are no two documetns with the same id? So ES will
reject the second dupe?


(electic) #14

Ah okay, and for bulk indexing how would this work?

{"index":{"_id":"MB:7beb57e3-fa37-3067-a845-
a29578c68b4z","_index":"documents","_type":"document"}}

this is my call before the next line which is the document. How would
I tell it not to do versioning here? Something like this:

{"index":{"_id":"MB:7beb57e3-fa37-3067-a845-
a29578c68b4z","_index":"documents","_type":"document", "op_type :
"create"}} ?

On Apr 22, 4:06 pm, David Williams williams.da...@gmail.com wrote:

it's covered here:http://www.elasticsearch.org/blog/2011/02/08/versioning.html
(look at "put if absent")

-david

On Fri, Apr 22, 2011 at 3:54 PM, electic elec...@gmail.com wrote:

Okay, I tested it. It seems to work as you said. What about versioning
though? Let's say you are inserting a dupe now and you want ES to just
ignore it. I looked at the doc and it seems it just updates the
version number and stores the new doc. How would you get it to ignore
the insert all together. Is the a curl example of that?

All our calls are inserts, I wouldn't have the IDs available. Just
want the insert to be ignored if the _id is already in the system.

On Apr 22, 1:42 pm, electic elec...@gmail.com wrote:

Just to clairfy, you are saying that I should make it the _id and ES
will make sure there are no two documetns with the same id? So ES will
reject the second dupe?


(Clinton Gormley) #15

On Fri, 2011-04-22 at 16:13 -0700, electic wrote:

Ah okay, and for bulk indexing how would this work?

{"index":{"_id":"MB:7beb57e3-fa37-3067-a845-
a29578c68b4z","_index":"documents","_type":"document"}}

this is my call before the next line which is the document. How would
I tell it not to do versioning here? Something like this:

{"index":{"_id":"MB:7beb57e3-fa37-3067-a845-
a29578c68b4z","_index":"documents","_type":"document", "op_type :
"create"}} ?

Or simpler:

{"create":{"_id":"MB:7beb57e3-fa37-3067-a845-

a29578c68b4z","_index":"documents","_type":"document"}}

This will fail if a version of the doc with the same _id already exists
in ES.

clint


(electic) #16

Thank you so much clint, you have been a great help!

On Apr 23, 2:36 am, Clinton Gormley clin...@iannounce.co.uk wrote:

On Fri, 2011-04-22 at 16:13 -0700, electic wrote:

Ah okay, and for bulk indexing how would this work?

{"index":{"_id":"MB:7beb57e3-fa37-3067-a845-
a29578c68b4z","_index":"documents","_type":"document"}}

this is my call before the next line which is the document. How would
I tell it not to do versioning here? Something like this:

{"index":{"_id":"MB:7beb57e3-fa37-3067-a845-
a29578c68b4z","_index":"documents","_type":"document", "op_type :
"create"}} ?

Or simpler:

{"create":{"_id":"MB:7beb57e3-fa37-3067-a845-

a29578c68b4z","_index":"documents","_type":"document"}}

This will fail if a version of the doc with the same _id already exists
in ES.

clint


(system) #17