I am a bit new to this and I could not find anything in the
documentation but I would like to use the URL Get API to query
documents from a certain date. For example, everything above
2011-04-22 12:00AM. I have a created_at field that has the date and
time. What would be the best way to do this? Here is my document for
reference:
I am a bit new to this and I could not find anything in the
documentation but I would like to use the URL Get API to query
documents from a certain date. For example, everything above
2011-04-22 12:00AM. I have a created_at field that has the date and
time. What would be the best way to do this? Here is my document for
reference:
<Apache Lucene - Query Parser Syntax Searches>That
means something like curl -XGET
'host:port/_search?q=*+created_at:[2010-10-01+TO+3000-01-01]'
Note the arbitrary upper bound date value, I do not think Lucene range query
allows to omit lower or upper bound from the interval.
I am a bit new to this and I could not find anything in the
documentation but I would like to use the URL Get API to query
documents from a certain date. For example, everything above
2011-04-22 12:00AM. I have a created_at field that has the date and
time. What would be the best way to do this? Here is my document for
reference:
That means something like curl -XGET 'host:port/_search?q=*+created_at:[2010-10-01+TO+3000-01-01]'
Note the arbitrary upper bound date value, I do not think Lucene range query allows to omit lower or upper bound from the interval.
I am a bit new to this and I could not find anything in the
documentation but I would like to use the URL Get API to query
documents from a certain date. For example, everything above
2011-04-22 12:00AM. I have a created_at field that has the date and
time. What would be the best way to do this? Here is my document for
reference:
Thanks guys! I think I am just going to go ahead and use the POST part
of the API instead of trying to pass the whole thing via the URL. Made
life easier. On an un-related note. We are seriously looking at adding
10 more servers to the cluster to see how easy that is. Currently,
each server is a quad core xeon with 16GB of RAM and a 600GB SAS disk.
At this time, we have 5 shards. My understanding is that 5 shards = 5
servers. Now let's say I had 10 nodes and we have 15 servers. How do
we best migrate the index or modify the index to take advantage of
those 10 nodes?
Using the q parameter, you can place a * value to denote unbounded lower / upper search.
You can also provide the whole body supported in a source query string parameter.
-shay.banon
On Thursday, April 21, 2011 at 10:55 AM, Lukáš Vlček wrote:
Hi,
you can use Range Query support from Lucene syntaxhttp://lucene.apache.org/java/3_1_0/queryparsersyntax.html#RangeSearches
That means something like curl -XGET 'host:port/_search?q=*+created_at:[2010-10-01+TO+3000-01-01]'
Note the arbitrary upper bound date value, I do not think Lucene range query allows to omit lower or upper bound from the interval.
But if you can use POST requests or your client supports body content for GET requests then I would recommend you to take a look athttp://www.elasticsearch.org/guide/reference/query-dsl/range-filter.htmlorhttp://www.elasticsearch.org/guide/reference/query-dsl/range-query.html
I am a bit new to this and I could not find anything in the
documentation but I would like to use the URL Get API to query
documents from a certain date. For example, everything above
2011-04-22 12:00AM. I have a created_at field that has the date and
time. What would be the best way to do this? Here is my document for
reference:
Thanks guys! I think I am just going to go ahead and use the POST part
of the API instead of trying to pass the whole thing via the URL. Made
life easier. On an un-related note. We are seriously looking at adding
10 more servers to the cluster to see how easy that is. Currently,
each server is a quad core xeon with 16GB of RAM and a 600GB SAS disk.
At this time, we have 5 shards. My understanding is that 5 shards = 5
servers. Now let's say I had 10 nodes and we have 15 servers. How do
we best migrate the index or modify the index to take advantage of
those 10 nodes?
You have primary shards and replicas. By default, you'd have 5 primary
shards and 1 replica (ie 1 replica for each primary shard)
So if you start 10 nodes, then you would have one shard on each (either
a primary or a replica).
You can dynamically increase your number of replicas, so if you set
replicas to 2, then you'd have enough shards to fill 15 nodes with one
shard each.
If you want to (and need to) increase the number of primary shards, then
you will need to reindex to a new index that has been created with a
higher number of primary shards.
Thanks guys! I think I am just going to go ahead and use the POST part
of the API instead of trying to pass the whole thing via the URL. Made
life easier. On an un-related note. We are seriously looking at adding
10 more servers to the cluster to see how easy that is. Currently,
each server is a quad core xeon with 16GB of RAM and a 600GB SAS disk.
At this time, we have 5 shards. My understanding is that 5 shards = 5
servers. Now let's say I had 10 nodes and we have 15 servers. How do
we best migrate the index or modify the index to take advantage of
those 10 nodes?
You have primary shards and replicas. By default, you'd have 5 primary
shards and 1 replica (ie 1 replica for each primary shard)
So if you start 10 nodes, then you would have one shard on each (either
a primary or a replica).
You can dynamically increase your number of replicas, so if you set
replicas to 2, then you'd have enough shards to fill 15 nodes with one
shard each.
If you want to (and need to) increase the number of primary shards, then
you will need to reindex to a new index that has been created with a
higher number of primary shards.
This is the auto generate id elasticsearch generates. Its a UUID that has been base64. A long value is much more problematic to do in distributed systems, but, you can provide one (the id) if you want.
On Friday, April 22, 2011 at 12:41 AM, electic wrote:
Sounds great. One last question. I noticed the IDs are alphanumeric:
Thanks guys! I think I am just going to go ahead and use the POST part
of the API instead of trying to pass the whole thing via the URL. Made
life easier. On an un-related note. We are seriously looking at adding
10 more servers to the cluster to see how easy that is. Currently,
each server is a quad core xeon with 16GB of RAM and a 600GB SAS disk.
At this time, we have 5 shards. My understanding is that 5 shards = 5
servers. Now let's say I had 10 nodes and we have 15 servers. How do
we best migrate the index or modify the index to take advantage of
those 10 nodes?
You have primary shards and replicas. By default, you'd have 5 primary
shards and 1 replica (ie 1 replica for each primary shard)
So if you start 10 nodes, then you would have one shard on each (either
a primary or a replica).
You can dynamically increase your number of replicas, so if you set
replicas to 2, then you'd have enough shards to fill 15 nodes with one
shard each.
If you want to (and need to) increase the number of primary shards, then
you will need to reindex to a new index that has been created with a
higher number of primary shards.
This is the auto generate id elasticsearch generates. Its a UUID that has been base64. A long value is much more problematic to do in distributed systems, but, you can provide one (the id) if you want.
On Friday, April 22, 2011 at 12:41 AM, electic wrote:
Sounds great. One last question. I noticed the IDs are alphanumeric:
Thanks guys! I think I am just going to go ahead and use the POST part
of the API instead of trying to pass the whole thing via the URL. Made
life easier. On an un-related note. We are seriously looking at adding
10 more servers to the cluster to see how easy that is. Currently,
each server is a quad core xeon with 16GB of RAM and a 600GB SAS disk.
At this time, we have 5 shards. My understanding is that 5 shards = 5
servers. Now let's say I had 10 nodes and we have 15 servers. How do
we best migrate the index or modify the index to take advantage of
those 10 nodes?
You have primary shards and replicas. By default, you'd have 5 primary
shards and 1 replica (ie 1 replica for each primary shard)
So if you start 10 nodes, then you would have one shard on each (either
a primary or a replica).
You can dynamically increase your number of replicas, so if you set
replicas to 2, then you'd have enough shards to fill 15 nodes with one
shard each.
If you want to (and need to) increase the number of primary shards, then
you will need to reindex to a new index that has been created with a
higher number of primary shards.
There isn't a way to do it, but maybe you should use it as the value of the document id when indexing?
On Friday, April 22, 2011 at 4:05 AM, electic wrote:
Shay, you are going to kill me but I have one more question. It has to
do with uniqueness. Our documents kind of look like this:
This is the auto generate id elasticsearch generates. Its a UUID that has been base64. A long value is much more problematic to do in distributed systems, but, you can provide one (the id) if you want.
On Friday, April 22, 2011 at 12:41 AM, electic wrote:
Sounds great. One last question. I noticed the IDs are alphanumeric:
Thanks guys! I think I am just going to go ahead and use the POST part
of the API instead of trying to pass the whole thing via the URL. Made
life easier. On an un-related note. We are seriously looking at adding
10 more servers to the cluster to see how easy that is. Currently,
each server is a quad core xeon with 16GB of RAM and a 600GB SAS disk.
At this time, we have 5 shards. My understanding is that 5 shards = 5
servers. Now let's say I had 10 nodes and we have 15 servers. How do
we best migrate the index or modify the index to take advantage of
those 10 nodes?
You have primary shards and replicas. By default, you'd have 5 primary
shards and 1 replica (ie 1 replica for each primary shard)
So if you start 10 nodes, then you would have one shard on each (either
a primary or a replica).
You can dynamically increase your number of replicas, so if you set
replicas to 2, then you'd have enough shards to fill 15 nodes with one
shard each.
If you want to (and need to) increase the number of primary shards, then
you will need to reindex to a new index that has been created with a
higher number of primary shards.
Just to clairfy, you are saying that I should make it the _id and ES
will make sure there are no two documetns with the same id? So ES will
reject the second dupe?
Okay, I tested it. It seems to work as you said. What about versioning
though? Let's say you are inserting a dupe now and you want ES to just
ignore it. I looked at the doc and it seems it just updates the
version number and stores the new doc. How would you get it to ignore
the insert all together. Is the a curl example of that?
All our calls are inserts, I wouldn't have the IDs available. Just
want the insert to be ignored if the _id is already in the system.
Just to clairfy, you are saying that I should make it the _id and ES
will make sure there are no two documetns with the same id? So ES will
reject the second dupe?
Okay, I tested it. It seems to work as you said. What about versioning
though? Let's say you are inserting a dupe now and you want ES to just
ignore it. I looked at the doc and it seems it just updates the
version number and stores the new doc. How would you get it to ignore
the insert all together. Is the a curl example of that?
All our calls are inserts, I wouldn't have the IDs available. Just
want the insert to be ignored if the _id is already in the system.
Just to clairfy, you are saying that I should make it the _id and ES
will make sure there are no two documetns with the same id? So ES will
reject the second dupe?
Okay, I tested it. It seems to work as you said. What about versioning
though? Let's say you are inserting a dupe now and you want ES to just
ignore it. I looked at the doc and it seems it just updates the
version number and stores the new doc. How would you get it to ignore
the insert all together. Is the a curl example of that?
All our calls are inserts, I wouldn't have the IDs available. Just
want the insert to be ignored if the _id is already in the system.
Just to clairfy, you are saying that I should make it the _id and ES
will make sure there are no two documetns with the same id? So ES will
reject the second dupe?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.