Boolean query vs filters and more


(Andrei) #1

Hi,

I've just started using ElasticSearch and so far it's been fairly easy
to get going with. However, I have a couple of questions that came up
during the course of development:

  1. I need to find docs where a string field contains a keyword, but
    restricted by an integer field matching a given value. Is it better
    to use the "bool" query feature (with sub-queries for string and
    integer portions), or do the string query with a filter?

  2. Is it possible to specify multiple fields to search for a given
    keyword? So far I've been using _all, but sometimes I need to change
    which fields I'm matching in dynamically and _all is predetermined by
    the mapping.

Thank you,

-Andrei


(Lukáš Vlček) #2

Hi,

On Mon, Aug 16, 2010 at 10:44 PM, Andrei andrei@zmievski.org wrote:

Hi,

I've just started using ElasticSearch and so far it's been fairly easy
to get going with. However, I have a couple of questions that came up
during the course of development:

  1. I need to find docs where a string field contains a keyword, but
    restricted by an integer field matching a given value. Is it better
    to use the "bool" query feature (with sub-queries for string and
    integer portions), or do the string query with a filter?

Filters perform better because they do not perform scoring. So I think in
your case the filtering is way to go.

  1. Is it possible to specify multiple fields to search for a given
    keyword? So far I've been using _all, but sometimes I need to change
    which fields I'm matching in dynamically and _all is predetermined by
    the mapping.

I think you can use
boolhttp://www.elasticsearch.com/docs/elasticsearch/rest_api/query_dsl/bool_query/
query
and combine several
query_stringhttp://www.elasticsearch.com/docs/elasticsearch/rest_api/query_dsl/query_string_query/queries
with default_field specified accordingly (or you can combine
fieldhttp://www.elasticsearch.com/docs/elasticsearch/rest_api/query_dsl/field_query/queries).

Thank you,

-Andrei

Regards,
Lukas


(Shay Banon) #3

On Tue, Aug 17, 2010 at 12:00 AM, Lukáš Vlček lukas.vlcek@gmail.com wrote:

Hi,

On Mon, Aug 16, 2010 at 10:44 PM, Andrei andrei@zmievski.org wrote:

Hi,

I've just started using ElasticSearch and so far it's been fairly easy
to get going with. However, I have a couple of questions that came up
during the course of development:

  1. I need to find docs where a string field contains a keyword, but
    restricted by an integer field matching a given value. Is it better
    to use the "bool" query feature (with sub-queries for string and
    integer portions), or do the string query with a filter?

Filters perform better because they do not perform scoring. So I think in
your case the filtering is way to go.

Yep, filters are the best way to go here. Much faster, and easily cacheable.

  1. Is it possible to specify multiple fields to search for a given
    keyword? So far I've been using _all, but sometimes I need to change
    which fields I'm matching in dynamically and _all is predetermined by
    the mapping.

I think you can use boolhttp://www.elasticsearch.com/docs/elasticsearch/rest_api/query_dsl/bool_query/ query
and combine several query_stringhttp://www.elasticsearch.com/docs/elasticsearch/rest_api/query_dsl/query_string_query/queries with default_field specified accordingly (or you can combine
fieldhttp://www.elasticsearch.com/docs/elasticsearch/rest_api/query_dsl/field_query/queries).

Actually, if you use query_string, then it allows you to define several
fields for it to execute on. Check out the bottom part of the docs here:
http://www.elasticsearch.com/docs/elasticsearch/rest_api/query_dsl/query_string_query/
.

Thank you,

-Andrei

Regards,
Lukas


(Andrei) #4

Great, that lets me eliminate the _all field then.

Also, does ES support indexing non-English text?

-Andrei

On Aug 16, 11:54 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Actually, if you use query_string, then it allows you to define several
fields for it to execute on. Check out the bottom part of the docs here:http://www.elasticsearch.com/docs/elasticsearch/rest_api/query_dsl/qu...


(Shay Banon) #5

It depends on the _all field. For the price of storing more data in the
index, you can get much better results in terms of query performance
(comparing _all to X files query).

-shay.banon

On Tue, Aug 17, 2010 at 8:56 PM, Andrei andrei@zmievski.org wrote:

Great, that lets me eliminate the _all field then.

Also, does ES support indexing non-English text?

-Andrei

On Aug 16, 11:54 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Actually, if you use query_string, then it allows you to define several
fields for it to execute on. Check out the bottom part of the docs here:
http://www.elasticsearch.com/docs/elasticsearch/rest_api/query_dsl/qu...


(Andrei) #6

I currently only search against 3 fields, so that's what my _all would
contain. The total length of 3 fields is likely to be 300 to 1000
characters.

-Andrei

On Aug 17, 11:25 am, Shay Banon shay.ba...@elasticsearch.com wrote:

It depends on the _all field. For the price of storing more data in the
index, you can get much better results in terms of query performance
(comparing _all to X files query).

-shay.banon


(Shay Banon) #7

Then that might make sense, really depends on the latency of you require
from your searches. _all field causes larger index and slower indexing time,
but usually results in faster queries when trying to match on several
fields. In your case, it sounds like _all field is not needed, but I would
run a simple test to see the difference in expected queries.

-shay.banon

On Tue, Aug 17, 2010 at 10:40 PM, Andrei andrei@zmievski.org wrote:

I currently only search against 3 fields, so that's what my _all would
contain. The total length of 3 fields is likely to be 300 to 1000
characters.

-Andrei

On Aug 17, 11:25 am, Shay Banon shay.ba...@elasticsearch.com wrote:

It depends on the _all field. For the price of storing more data in the
index, you can get much better results in terms of query performance
(comparing _all to X files query).

-shay.banon


(Andrei) #8

Will do. Is there a good way to time ES queries aside from just timing
the curl request?

-Andrei

On Aug 17, 12:45 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Then that might make sense, really depends on the latency of you require
from your searches. _all field causes larger index and slower indexing time,
but usually results in faster queries when trying to match on several
fields. In your case, it sounds like _all field is not needed, but I would
run a simple test to see the difference in expected queries.

-shay.banon


(Shay Banon) #9

Thats pretty much it, though I think you should time it based on how you
plan to actually invoke it in production (from the lang you plan to use, and
so on).

-shay.banon

On Wed, Aug 18, 2010 at 12:19 AM, Andrei andrei@zmievski.org wrote:

Will do. Is there a good way to time ES queries aside from just timing
the curl request?

-Andrei

On Aug 17, 12:45 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Then that might make sense, really depends on the latency of you require
from your searches. _all field causes larger index and slower indexing
time,
but usually results in faster queries when trying to match on several
fields. In your case, it sounds like _all field is not needed, but I
would
run a simple test to see the difference in expected queries.

-shay.banon


(system) #10