Efficiency of search vs get


(Steff) #1

curl -X PUT "localhost:9200/mytest/abc/_mapping" -d '{
"abc" : {
"_routing" : {
"required" : true
}
"properties" : {
"idx" : {"type" : "string", "index" : "not_analyzed"},
"a" : {"type" : "string"},
"b" : {"type" : "string"},
"c" : {"type" : "integer"},
"txt" : {"type" : "string", "null_value" : "na"}
}
}
}

Lots of abc documents indexed into mytest index - a.o. this
curl -XPUT
"localhost:9200/mytest/abc/1234_5678_90123?routing=1234_5678_90123" -d '{
"sms" :
{
"a" : "1234",
"b" : "5678",
"c" : 90123,
"txt" : "Hello World"
}
}

Expect this "get" will be very efficient:
curl -XGET
"http://localhost:9200/mytest/abc/1234_5678_90123?routing=1234_5678_90123"

I have cheated a little in the code above, when I indicate that I can
make an id consisting of the values of a, b and c. It is only almost
true - sometimes (but very very seldom) there will be documents with the
same values for a, b and c. Therefore I cannot make id's like this (will
have to make a_b_c_X id's og just GUID id's instead), and therefore I
cannot "find" the document(s) using the "get" above.

Question: If I know that there will never be more than a few documents
with concrete values for a, b and c, can I create a "search" finding
those documents, a search that is just (or almost) as efficient (with
respect to searchtime and resources used) as the "get" above? Note that
I am using routing so I should at least be able to hit the right shard
in such a search.

In a RDMS I would make an combined index of a, b and c and use the query
"select * from abc where a="1234" and b="5678" and c=90123" (the
"search") instead of "select * from abc where id="1234_5678_90123"" (the
"get"), and that would be just as efficient (if the RDMS uses the
combined index, or else I will force it by hinting).

Thanks!

Regards, Per Steffensen


(Shay Banon) #2

Get is as fast as you can go to retrieve a single document, search against a
single field (term query) that uses routing to direct the search request to
a single shard will be almost as fast, but not the same. I don't have actual
numbers to say how slower it will be.

Regarding a combined index, there is no option to do that in elasticsearch.
You can do a boolean query, with several must clauses including term query
against a, b, and c. This will be slower (since now you are not searching on
a single field, but 3).

On the other hand, the _routing field is automatically indexed (not
analyzed). So, based on the same below, you can simply do a term query
against _routing field with the routing value.

Of course, you might get several documents with the search request, but I
think you factored that in (a_b_c_1, and a_b_c_2).

On Thu, Sep 15, 2011 at 10:06 PM, Per Steffensen steff@designware.dkwrote:

curl -X PUT "localhost:9200/mytest/abc/_**mapping" -d '{
"abc" : {
"_routing" : {
"required" : true
}
"properties" : {
"idx" : {"type" : "string", "index" : "not_analyzed"},
"a" : {"type" : "string"},
"b" : {"type" : "string"},
"c" : {"type" : "integer"},
"txt" : {"type" : "string", "null_value" : "na"}
}
}
}

Lots of abc documents indexed into mytest index - a.o. this
curl -XPUT "localhost:9200/mytest/abc/**1234_5678_90123?routing=1234_**5678_90123"
-d '{
"sms" :
{
"a" : "1234",
"b" : "5678",
"c" : 90123,
"txt" : "Hello World"
}
}

Expect this "get" will be very efficient:
curl -XGET "http://localhost:9200/mytest/abc/1234_5678_90123?routing=
1234_5678_90123http://localhost:9200/mytest/abc/1234_5678_90123?routing=1234_5678_90123
"

I have cheated a little in the code above, when I indicate that I can make
an id consisting of the values of a, b and c. It is only almost true -
sometimes (but very very seldom) there will be documents with the same
values for a, b and c. Therefore I cannot make id's like this (will have to
make a_b_c_X id's og just GUID id's instead), and therefore I cannot "find"
the document(s) using the "get" above.

Question: If I know that there will never be more than a few documents with
concrete values for a, b and c, can I create a "search" finding those
documents, a search that is just (or almost) as efficient (with respect to
searchtime and resources used) as the "get" above? Note that I am using
routing so I should at least be able to hit the right shard in such a
search.

In a RDMS I would make an combined index of a, b and c and use the query
"select * from abc where a="1234" and b="5678" and c=90123" (the "search")
instead of "select * from abc where id="1234_5678_90123"" (the "get"), and
that would be just as efficient (if the RDMS uses the combined index, or
else I will force it by hinting).

Thanks!

Regards, Per Steffensen


(Steff) #3

Shay Banon skrev:

Get is as fast as you can go to retrieve a single document, search
against a single field (term query) that uses routing to direct the
search request to a single shard will be almost as fast, but not the
same. I don't have actual numbers to say how slower it will be.

Regarding a combined index, there is no option to do that in
elasticsearch. You can do a boolean query, with several must clauses
including term query against a, b, and c. This will be slower (since
now you are not searching on a single field, but 3).

On the other hand, the _routing field is automatically indexed (not
analyzed). So, based on the same below, you can simply do a term query
against _routing field with the routing value.
Thanks. Will the following code do the trick?

client.prepareSearch(indexName).setRouting(routingStr).setFilter(new
TermFilterBuilder("_routing", routingStr)).execute().actionGet();

Of course, you might get several documents with the search request,
but I think you factored that in (a_b_c_1, and a_b_c_2).

On Thu, Sep 15, 2011 at 10:06 PM, Per Steffensen <steff@designware.dk
mailto:steff@designware.dk> wrote:

curl -X PUT "localhost:9200/mytest/abc/_mapping" -d '{
"abc" : {
 "_routing" : {
 "required" : true
 }
 "properties" : {
 "idx" : {"type" : "string", "index" : "not_analyzed"},
 "a" : {"type" : "string"},
 "b" : {"type" : "string"},
 "c" : {"type" : "integer"},
 "txt" : {"type" : "string", "null_value" : "na"}
 }
}
}

Lots of abc documents indexed into mytest index - a.o. this
curl -XPUT
"localhost:9200/mytest/abc/1234_5678_90123?routing=1234_5678_90123"
-d '{
"sms" :
{
 "a" : "1234",
 "b" : "5678",
 "c" : 90123,
 "txt" : "Hello World"
}
}

Expect this "get" will be very efficient:
curl -XGET
"http://localhost:9200/mytest/abc/1234_5678_90123?routing=1234_5678_90123"

I have cheated a little in the code above, when I indicate that I
can make an id consisting of the values of a, b and c. It is only
almost true - sometimes (but very very seldom) there will be
documents with the same values for a, b and c. Therefore I cannot
make id's like this (will have to make a_b_c_X id's og just GUID
id's instead), and therefore I cannot "find" the document(s) using
the "get" above.

Question: If I know that there will never be more than a few
documents with concrete values for a, b and c, can I create a
"search" finding those documents, a search that is just (or
almost) as efficient (with respect to searchtime and resources
used) as the "get" above? Note that I am using routing so I should
at least be able to hit the right shard in such a search.

In a RDMS I would make an combined index of a, b and c and use the
query "select * from abc where a="1234" and b="5678" and c=90123"
(the "search") instead of "select * from abc where
id="1234_5678_90123"" (the "get"), and that would be just as
efficient (if the RDMS uses the combined index, or else I will
force it by hinting).

Thanks!

Regards, Per Steffensen

(Shay Banon) #4

Replace setFilter with setQuery(QueryBuilders.termQuery("_routing",
routingStr), as the filter is mainly used to filter results fo the query you
execute (mainly used with faceting).

On Fri, Sep 23, 2011 at 11:18 PM, Per Steffensen steff@designware.dkwrote:

**
Shay Banon skrev:

Get is as fast as you can go to retrieve a single document, search against
a single field (term query) that uses routing to direct the search request
to a single shard will be almost as fast, but not the same. I don't have
actual numbers to say how slower it will be.

Regarding a combined index, there is no option to do that in
elasticsearch. You can do a boolean query, with several must clauses
including term query against a, b, and c. This will be slower (since now you
are not searching on a single field, but 3).

On the other hand, the _routing field is automatically indexed (not
analyzed). So, based on the same below, you can simply do a term query
against _routing field with the routing value.

Thanks. Will the following code do the trick?

client.prepareSearch(indexName).setRouting(routingStr).setFilter(new
TermFilterBuilder("_routing", routingStr)).execute().actionGet();

Of course, you might get several documents with the search request, but I
think you factored that in (a_b_c_1, and a_b_c_2).

On Thu, Sep 15, 2011 at 10:06 PM, Per Steffensen steff@designware.dkwrote:

curl -X PUT "localhost:9200/mytest/abc/_mapping" -d '{
"abc" : {
"_routing" : {
"required" : true
}
"properties" : {
"idx" : {"type" : "string", "index" : "not_analyzed"},
"a" : {"type" : "string"},
"b" : {"type" : "string"},
"c" : {"type" : "integer"},
"txt" : {"type" : "string", "null_value" : "na"}
}
}
}

Lots of abc documents indexed into mytest index - a.o. this
curl -XPUT
"localhost:9200/mytest/abc/1234_5678_90123?routing=1234_5678_90123" -d '{
"sms" :
{
"a" : "1234",
"b" : "5678",
"c" : 90123,
"txt" : "Hello World"
}
}

Expect this "get" will be very efficient:
curl -XGET "
http://localhost:9200/mytest/abc/1234_5678_90123?routing=1234_5678_90123"

I have cheated a little in the code above, when I indicate that I can make
an id consisting of the values of a, b and c. It is only almost true -
sometimes (but very very seldom) there will be documents with the same
values for a, b and c. Therefore I cannot make id's like this (will have to
make a_b_c_X id's og just GUID id's instead), and therefore I cannot "find"
the document(s) using the "get" above.

Question: If I know that there will never be more than a few documents
with concrete values for a, b and c, can I create a "search" finding those
documents, a search that is just (or almost) as efficient (with respect to
searchtime and resources used) as the "get" above? Note that I am using
routing so I should at least be able to hit the right shard in such a
search.

In a RDMS I would make an combined index of a, b and c and use the query
"select * from abc where a="1234" and b="5678" and c=90123" (the "search")
instead of "select * from abc where id="1234_5678_90123"" (the "get"), and
that would be just as efficient (if the RDMS uses the combined index, or
else I will force it by hinting).

Thanks!

Regards, Per Steffensen


(thale jacobs) #5

From the example:
client.prepareSearch(indexName).setRouting(routingStr).setQuery(
QueryBuilders.termQuery("_routing", routingStr)).execute().actionGet();

For clarification, can someone verify that the routing needs to be
specified via setRouting(routingStr) as well as
TermQuery(QueryBuilders.termQuery("_routing", routingStr)...? I am
having a difficult time finding documentation on the java client api as it
pertains to routing. Thanks for the help.

On Friday, September 23, 2011 7:15:43 PM UTC-4, kimchy wrote:

Replace setFilter with setQuery(QueryBuilders.termQuery("_routing",
routingStr), as the filter is mainly used to filter results fo the query
you execute (mainly used with faceting).

On Fri, Sep 23, 2011 at 11:18 PM, Per Steffensen <st...@designware.dk
<javascript:>> wrote:

Shay Banon skrev:

Get is as fast as you can go to retrieve a single document, search
against a single field (term query) that uses routing to direct the search
request to a single shard will be almost as fast, but not the same. I don't
have actual numbers to say how slower it will be.

Regarding a combined index, there is no option to do that in
elasticsearch. You can do a boolean query, with several must clauses
including term query against a, b, and c. This will be slower (since now
you are not searching on a single field, but 3).

On the other hand, the _routing field is automatically indexed (not
analyzed). So, based on the same below, you can simply do a term query
against _routing field with the routing value.

Thanks. Will the following code do the trick?

client.prepareSearch(indexName).setRouting(routingStr).setFilter(new
TermFilterBuilder("_routing", routingStr)).execute().actionGet();

Of course, you might get several documents with the search request, but
I think you factored that in (a_b_c_1, and a_b_c_2).

On Thu, Sep 15, 2011 at 10:06 PM, Per Steffensen <st...@designware.dk
<javascript:>> wrote:

curl -X PUT "localhost:9200/mytest/abc/_mapping" -d '{
"abc" : {
"_routing" : {
"required" : true
}
"properties" : {
"idx" : {"type" : "string", "index" : "not_analyzed"},
"a" : {"type" : "string"},
"b" : {"type" : "string"},
"c" : {"type" : "integer"},
"txt" : {"type" : "string", "null_value" : "na"}
}
}
}

Lots of abc documents indexed into mytest index - a.o. this
curl -XPUT
"localhost:9200/mytest/abc/1234_5678_90123?routing=1234_5678_90123" -d '{
"sms" :
{
"a" : "1234",
"b" : "5678",
"c" : 90123,
"txt" : "Hello World"
}
}

Expect this "get" will be very efficient:
curl -XGET "
http://localhost:9200/mytest/abc/1234_5678_90123?routing=1234_5678_90123
"

I have cheated a little in the code above, when I indicate that I can
make an id consisting of the values of a, b and c. It is only almost true -
sometimes (but very very seldom) there will be documents with the same
values for a, b and c. Therefore I cannot make id's like this (will have to
make a_b_c_X id's og just GUID id's instead), and therefore I cannot "find"
the document(s) using the "get" above.

Question: If I know that there will never be more than a few documents
with concrete values for a, b and c, can I create a "search" finding those
documents, a search that is just (or almost) as efficient (with respect to
searchtime and resources used) as the "get" above? Note that I am using
routing so I should at least be able to hit the right shard in such a
search.

In a RDMS I would make an combined index of a, b and c and use the query
"select * from abc where a="1234" and b="5678" and c=90123" (the "search")
instead of "select * from abc where id="1234_5678_90123"" (the "get"), and
that would be just as efficient (if the RDMS uses the combined index, or
else I will force it by hinting).

Thanks!

Regards, Per Steffensen

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7012fe03-4b32-4a86-8ca2-0cfdeb635760%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Alexander Reelsen) #6

Hey,

it is sufficient to set the routing via setRouting in the Java API... in
case of doubts, you can always check the RestActions in the source and see
how they do it...

--Alex

On Wed, Jul 16, 2014 at 7:28 PM, thale jacobs thalejacobs@gmail.com wrote:

From the example:
client.prepareSearch(indexName).setRouting(routingStr).setQuery(
QueryBuilders.termQuery("_routing", routingStr)).execute().actionGet();

For clarification, can someone verify that the routing needs to be
specified via setRouting(routingStr) as well as
TermQuery(QueryBuilders.termQuery("_routing", routingStr)...? I am
having a difficult time finding documentation on the java client api as it
pertains to routing. Thanks for the help.

On Friday, September 23, 2011 7:15:43 PM UTC-4, kimchy wrote:

Replace setFilter with setQuery(QueryBuilders.termQuery("_routing",
routingStr), as the filter is mainly used to filter results fo the query
you execute (mainly used with faceting).

On Fri, Sep 23, 2011 at 11:18 PM, Per Steffensen st...@designware.dk
wrote:

Shay Banon skrev:

Get is as fast as you can go to retrieve a single document, search
against a single field (term query) that uses routing to direct the search
request to a single shard will be almost as fast, but not the same. I don't
have actual numbers to say how slower it will be.

Regarding a combined index, there is no option to do that in
elasticsearch. You can do a boolean query, with several must clauses
including term query against a, b, and c. This will be slower (since now
you are not searching on a single field, but 3).

On the other hand, the _routing field is automatically indexed (not
analyzed). So, based on the same below, you can simply do a term query
against _routing field with the routing value.

Thanks. Will the following code do the trick?

client.prepareSearch(indexName).setRouting(routingStr).setFilter(new
TermFilterBuilder("_routing", routingStr)).execute().actionGet();

Of course, you might get several documents with the search request,
but I think you factored that in (a_b_c_1, and a_b_c_2).

On Thu, Sep 15, 2011 at 10:06 PM, Per Steffensen st...@designware.dk
wrote:

curl -X PUT "localhost:9200/mytest/abc/_mapping" -d '{
"abc" : {
"_routing" : {
"required" : true
}
"properties" : {
"idx" : {"type" : "string", "index" : "not_analyzed"},
"a" : {"type" : "string"},
"b" : {"type" : "string"},
"c" : {"type" : "integer"},
"txt" : {"type" : "string", "null_value" : "na"}
}
}
}

Lots of abc documents indexed into mytest index - a.o. this
curl -XPUT "localhost:9200/mytest/abc/1234_5678_90123?routing=1234_5678_90123"
-d '{
"sms" :
{
"a" : "1234",
"b" : "5678",
"c" : 90123,
"txt" : "Hello World"
}
}

Expect this "get" will be very efficient:
curl -XGET "http://localhost:9200/mytest/abc/1234_5678_90123?routing=
1234_5678_90123"

I have cheated a little in the code above, when I indicate that I can
make an id consisting of the values of a, b and c. It is only almost true -
sometimes (but very very seldom) there will be documents with the same
values for a, b and c. Therefore I cannot make id's like this (will have to
make a_b_c_X id's og just GUID id's instead), and therefore I cannot "find"
the document(s) using the "get" above.

Question: If I know that there will never be more than a few documents
with concrete values for a, b and c, can I create a "search" finding those
documents, a search that is just (or almost) as efficient (with respect to
searchtime and resources used) as the "get" above? Note that I am using
routing so I should at least be able to hit the right shard in such a
search.

In a RDMS I would make an combined index of a, b and c and use the
query "select * from abc where a="1234" and b="5678" and c=90123" (the
"search") instead of "select * from abc where id="1234_5678_90123"" (the
"get"), and that would be just as efficient (if the RDMS uses the combined
index, or else I will force it by hinting).

Thanks!

Regards, Per Steffensen

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/7012fe03-4b32-4a86-8ca2-0cfdeb635760%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/7012fe03-4b32-4a86-8ca2-0cfdeb635760%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGCwEM98B67dX0v%2Bf%2Bv_02%3Dh7MiKgT%2Bb0%3D8iBd7TGdPR-mhHow%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #7