Performance issues when flagging a document in Elasticsearch

Dror_Atariah · December 10, 2014, 3:22pm

Assume that I want to be able to flag documents in an index according to
their attributes: isFoo and isBar [1]. As far as I understand, there are
two approaches:

Use dedicated fields for the flags: If the document is a Foo then add a
field named isFoo. Similarly, for isBar.
Use a flags field that will be an array of strings. In this case, if the
document is Foo then "flags" will contain the string "isFoo".

What are the pros and cons in terms of space and runtime complexities?

Bear in mind the following queries examples: Consider the case where one
wants to check the attributes of the documents in the index. In particular,
if I want to find the documents that are either Foo or Bar I can either
(a) In case (1): Use a Boolean "should" filter the surrounds two "exists"'s
filters checking whether either isFoo or isBar exist.
(b) In case (2): Use a single "exists" filter that checks the existence of
the field "flags".

A different case, is if I want to find the documents that are both Foo
and Bar:
(a) In case (1): Like before, replace the "should" with a "must".
(b) In case (2): Surround two "term"s filters with a "must" Boolean one.

Lastly, finding the documents that are Foo but not Bar.

In the bottom line, In case (1) all queries boil down to mixture of
Boolean, exists and missing filters. In case (2), one has to process the
strings in the array of strings named "flags". My intuition is that it is
faster to use method (1). In terms of space complexity I believe there is
no difference.

I'm looking forward to your insights!
Dror

[1]: Obviously, there could be way more flags...

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ef637057-4303-4c75-9bbf-ed72e0d4806b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Itamar_Syn_Hershko · December 10, 2014, 3:26pm

Lucene / Elasticsearch is pretty much insignificant to this as long as you
use filters. You should prefer not_analyzed fields with string values to
represent those flags vs having dedicated boolean fields if you will have
more than a few such flags.

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Wed, Dec 10, 2014 at 10:22 AM, Dror Atariah drorata@gmail.com wrote:

Assume that I want to be able to flag documents in an index according to
their attributes: isFoo and isBar [1]. As far as I understand, there are
two approaches:

Use dedicated fields for the flags: If the document is a Foo then add a
field named isFoo. Similarly, for isBar.

Use a flags field that will be an array of strings. In this case, if
the document is Foo then "flags" will contain the string "isFoo".

What are the pros and cons in terms of space and runtime complexities?

Bear in mind the following queries examples: Consider the case where one
wants to check the attributes of the documents in the index. In particular,
if I want to find the documents that are either Foo or Bar I can either
(a) In case (1): Use a Boolean "should" filter the surrounds two
"exists"'s filters checking whether either isFoo or isBar exist.
(b) In case (2): Use a single "exists" filter that checks the existence of
the field "flags".

A different case, is if I want to find the documents that are both Foo
and Bar:
(a) In case (1): Like before, replace the "should" with a "must".
(b) In case (2): Surround two "term"s filters with a "must" Boolean one.

Lastly, finding the documents that are Foo but not Bar.

In the bottom line, In case (1) all queries boil down to mixture of
Boolean, exists and missing filters. In case (2), one has to process the
strings in the array of strings named "flags". My intuition is that it is
faster to use method (1). In terms of space complexity I believe there is
no difference.

I'm looking forward to your insights!
Dror

[1]: Obviously, there could be way more flags...

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ef637057-4303-4c75-9bbf-ed72e0d4806b%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/ef637057-4303-4c75-9bbf-ed72e0d4806b%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZstGjg-b7tHX8R56sGB9_znBzDwnJO4naC6y_L6FaQ19g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

jprante · December 10, 2014, 3:29pm

In space complexity, there is a difference. The more fields you use in a
search, the more Lucene must do heavy lifting and you need bigger caches
for filter.

The solution 2 with one field is more compact and therefore, faster.

Jörg

On Wed, Dec 10, 2014 at 4:26 PM, Itamar Syn-Hershko itamar@code972.com
wrote:

Lucene / Elasticsearch is pretty much insignificant to this as long as you
use filters. You should prefer not_analyzed fields with string values to
represent those flags vs having dedicated boolean fields if you will have
more than a few such flags.

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Wed, Dec 10, 2014 at 10:22 AM, Dror Atariah drorata@gmail.com wrote:

Assume that I want to be able to flag documents in an index according to
their attributes: isFoo and isBar [1]. As far as I understand, there are
two approaches:

Use dedicated fields for the flags: If the document is a Foo then add
a field named isFoo. Similarly, for isBar.

Use a flags field that will be an array of strings. In this case, if
the document is Foo then "flags" will contain the string "isFoo".

What are the pros and cons in terms of space and runtime complexities?

Bear in mind the following queries examples: Consider the case where one
wants to check the attributes of the documents in the index. In particular,
if I want to find the documents that are either Foo or Bar I can either
(a) In case (1): Use a Boolean "should" filter the surrounds two
"exists"'s filters checking whether either isFoo or isBar exist.
(b) In case (2): Use a single "exists" filter that checks the existence
of the field "flags".

A different case, is if I want to find the documents that are both Foo
and Bar:
(a) In case (1): Like before, replace the "should" with a "must".
(b) In case (2): Surround two "term"s filters with a "must" Boolean one.

Lastly, finding the documents that are Foo but not Bar.

In the bottom line, In case (1) all queries boil down to mixture of
Boolean, exists and missing filters. In case (2), one has to process the
strings in the array of strings named "flags". My intuition is that it is
faster to use method (1). In terms of space complexity I believe there is
no difference.

I'm looking forward to your insights!
Dror

[1]: Obviously, there could be way more flags...

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ef637057-4303-4c75-9bbf-ed72e0d4806b%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/ef637057-4303-4c75-9bbf-ed72e0d4806b%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZstGjg-b7tHX8R56sGB9_znBzDwnJO4naC6y_L6FaQ19g%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZstGjg-b7tHX8R56sGB9_znBzDwnJO4naC6y_L6FaQ19g%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFS8spKQLkO4b%2BWgSzGnUKwX2iMvpZq2o6kZAVMFnkmRg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Dror_Atariah · December 10, 2014, 3:33pm

Can you please elaborate on the matter? Why/how does the number of fields
relevant here?

On Wednesday, December 10, 2014 4:26:16 PM UTC+1, Itamar Syn-Hershko wrote:

Lucene / Elasticsearch is pretty much insignificant to this as long as you
use filters. You should prefer not_analyzed fields with string values to
represent those flags vs having dedicated boolean fields if you will have
more than a few such flags.

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Wed, Dec 10, 2014 at 10:22 AM, Dror Atariah <dro...@gmail.com
<javascript:>> wrote:

Assume that I want to be able to flag documents in an index according to
their attributes: isFoo and isBar [1]. As far as I understand, there are
two approaches:

Use dedicated fields for the flags: If the document is a Foo then add
a field named isFoo. Similarly, for isBar.

Use a flags field that will be an array of strings. In this case, if
the document is Foo then "flags" will contain the string "isFoo".

What are the pros and cons in terms of space and runtime complexities?

Bear in mind the following queries examples: Consider the case where one
wants to check the attributes of the documents in the index. In particular,
if I want to find the documents that are either Foo or Bar I can either
(a) In case (1): Use a Boolean "should" filter the surrounds two
"exists"'s filters checking whether either isFoo or isBar exist.
(b) In case (2): Use a single "exists" filter that checks the existence
of the field "flags".

A different case, is if I want to find the documents that are both Foo
and Bar:
(a) In case (1): Like before, replace the "should" with a "must".
(b) In case (2): Surround two "term"s filters with a "must" Boolean one.

Lastly, finding the documents that are Foo but not Bar.

In the bottom line, In case (1) all queries boil down to mixture of
Boolean, exists and missing filters. In case (2), one has to process the
strings in the array of strings named "flags". My intuition is that it is
faster to use method (1). In terms of space complexity I believe there is
no difference.

I'm looking forward to your insights!
Dror

[1]: Obviously, there could be way more flags...

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ef637057-4303-4c75-9bbf-ed72e0d4806b%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/ef637057-4303-4c75-9bbf-ed72e0d4806b%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/110a3b3d-9871-4d2d-a865-09a48dd0aaf5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Dror_Atariah · December 10, 2014, 3:35pm

@Itamar: Can you please elaborate on the matter? Why/how does the number of
fields relevant here?

On Wednesday, December 10, 2014 4:26:16 PM UTC+1, Itamar Syn-Hershko wrote:

Lucene / Elasticsearch is pretty much insignificant to this as long as you
use filters. You should prefer not_analyzed fields with string values to
represent those flags vs having dedicated boolean fields if you will have
more than a few such flags.

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Wed, Dec 10, 2014 at 10:22 AM, Dror Atariah <dro...@gmail.com
<javascript:>> wrote:

Assume that I want to be able to flag documents in an index according to
their attributes: isFoo and isBar [1]. As far as I understand, there are
two approaches:

Use dedicated fields for the flags: If the document is a Foo then add
a field named isFoo. Similarly, for isBar.

Use a flags field that will be an array of strings. In this case, if
the document is Foo then "flags" will contain the string "isFoo".

What are the pros and cons in terms of space and runtime complexities?

Bear in mind the following queries examples: Consider the case where one
wants to check the attributes of the documents in the index. In particular,
if I want to find the documents that are either Foo or Bar I can either
(a) In case (1): Use a Boolean "should" filter the surrounds two
"exists"'s filters checking whether either isFoo or isBar exist.
(b) In case (2): Use a single "exists" filter that checks the existence
of the field "flags".

A different case, is if I want to find the documents that are both Foo
and Bar:
(a) In case (1): Like before, replace the "should" with a "must".
(b) In case (2): Surround two "term"s filters with a "must" Boolean one.

Lastly, finding the documents that are Foo but not Bar.

In the bottom line, In case (1) all queries boil down to mixture of
Boolean, exists and missing filters. In case (2), one has to process the
strings in the array of strings named "flags". My intuition is that it is
faster to use method (1). In terms of space complexity I believe there is
no difference.

I'm looking forward to your insights!
Dror

[1]: Obviously, there could be way more flags...

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ef637057-4303-4c75-9bbf-ed72e0d4806b%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/ef637057-4303-4c75-9bbf-ed72e0d4806b%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c376b40d-1c46-43f5-952f-96ec01338788%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Itamar_Syn_Hershko · December 10, 2014, 3:57pm

Basically, you will have to maintain more filters. Also Lucene supports up
to certain amount of fields, it wasn't designed to handle unlimited number
of them

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Wed, Dec 10, 2014 at 10:35 AM, Dror Atariah drorata@gmail.com wrote:

@Itamar: Can you please elaborate on the matter? Why/how does the number
of fields relevant here?

On Wednesday, December 10, 2014 4:26:16 PM UTC+1, Itamar Syn-Hershko wrote:

Lucene / Elasticsearch is pretty much insignificant to this as long as
you use filters. You should prefer not_analyzed fields with string values
to represent those flags vs having dedicated boolean fields if you will
have more than a few such flags.

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Wed, Dec 10, 2014 at 10:22 AM, Dror Atariah dro...@gmail.com wrote:

Assume that I want to be able to flag documents in an index according to
their attributes: isFoo and isBar [1]. As far as I understand, there are
two approaches:

Use dedicated fields for the flags: If the document is a Foo then add
a field named isFoo. Similarly, for isBar.

Use a flags field that will be an array of strings. In this case, if
the document is Foo then "flags" will contain the string "isFoo".

What are the pros and cons in terms of space and runtime complexities?

Bear in mind the following queries examples: Consider the case where one
wants to check the attributes of the documents in the index. In particular,
if I want to find the documents that are either Foo or Bar I can either
(a) In case (1): Use a Boolean "should" filter the surrounds two
"exists"'s filters checking whether either isFoo or isBar exist.
(b) In case (2): Use a single "exists" filter that checks the existence
of the field "flags".

A different case, is if I want to find the documents that are both Foo
and Bar:
(a) In case (1): Like before, replace the "should" with a "must".
(b) In case (2): Surround two "term"s filters with a "must" Boolean one.

Lastly, finding the documents that are Foo but not Bar.

In the bottom line, In case (1) all queries boil down to mixture of
Boolean, exists and missing filters. In case (2), one has to process the
strings in the array of strings named "flags". My intuition is that it is
faster to use method (1). In terms of space complexity I believe there is
no difference.

I'm looking forward to your insights!
Dror

[1]: Obviously, there could be way more flags...

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/ef637057-4303-4c75-9bbf-ed72e0d4806b%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/ef637057-4303-4c75-9bbf-ed72e0d4806b%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/c376b40d-1c46-43f5-952f-96ec01338788%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/c376b40d-1c46-43f5-952f-96ec01338788%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHTr4Zv78zrf%3DBkiBr%2BB5k_tM0qOS5QEA83BQ2PD34WtoXt_HA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Dror_Atariah · December 10, 2014, 4:03pm

Is there any difference or any implications if there is also need of
aggregations?

On Wednesday, December 10, 2014 4:57:10 PM UTC+1, Itamar Syn-Hershko wrote:

Basically, you will have to maintain more filters. Also Lucene supports up
to certain amount of fields, it wasn't designed to handle unlimited number
of them

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Wed, Dec 10, 2014 at 10:35 AM, Dror Atariah <dro...@gmail.com
<javascript:>> wrote:

@Itamar: Can you please elaborate on the matter? Why/how does the number
of fields relevant here?

On Wednesday, December 10, 2014 4:26:16 PM UTC+1, Itamar Syn-Hershko
wrote:

Lucene / Elasticsearch is pretty much insignificant to this as long as
you use filters. You should prefer not_analyzed fields with string values
to represent those flags vs having dedicated boolean fields if you will
have more than a few such flags.

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Wed, Dec 10, 2014 at 10:22 AM, Dror Atariah dro...@gmail.com wrote:

Assume that I want to be able to flag documents in an index according
to their attributes: isFoo and isBar [1]. As far as I understand, there are
two approaches:

Use dedicated fields for the flags: If the document is a Foo then
add a field named isFoo. Similarly, for isBar.

Use a flags field that will be an array of strings. In this case, if
the document is Foo then "flags" will contain the string "isFoo".

What are the pros and cons in terms of space and runtime complexities?

Bear in mind the following queries examples: Consider the case where
one wants to check the attributes of the documents in the index. In
particular, if I want to find the documents that are either Foo or Bar I
can either
(a) In case (1): Use a Boolean "should" filter the surrounds two
"exists"'s filters checking whether either isFoo or isBar exist.
(b) In case (2): Use a single "exists" filter that checks the existence
of the field "flags".

A different case, is if I want to find the documents that are both Foo
and Bar:
(a) In case (1): Like before, replace the "should" with a "must".
(b) In case (2): Surround two "term"s filters with a "must" Boolean one.

Lastly, finding the documents that are Foo but not Bar.

In the bottom line, In case (1) all queries boil down to mixture of
Boolean, exists and missing filters. In case (2), one has to process the
strings in the array of strings named "flags". My intuition is that it is
faster to use method (1). In terms of space complexity I believe there is
no difference.

I'm looking forward to your insights!
Dror

[1]: Obviously, there could be way more flags...

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/ef637057-4303-4c75-9bbf-ed72e0d4806b%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/ef637057-4303-4c75-9bbf-ed72e0d4806b%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/c376b40d-1c46-43f5-952f-96ec01338788%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/c376b40d-1c46-43f5-952f-96ec01338788%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a74d02d3-5065-4642-801e-a1823fab37a4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Itamar_Syn_Hershko · December 10, 2014, 4:05pm

I imagine the types of graphs you could come up with will differ
significantly, to start with

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Wed, Dec 10, 2014 at 11:03 AM, Dror Atariah drorata@gmail.com wrote:

Is there any difference or any implications if there is also need of
aggregations?

On Wednesday, December 10, 2014 4:57:10 PM UTC+1, Itamar Syn-Hershko wrote:

Basically, you will have to maintain more filters. Also Lucene supports
up to certain amount of fields, it wasn't designed to handle unlimited
number of them

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Wed, Dec 10, 2014 at 10:35 AM, Dror Atariah dro...@gmail.com wrote:

@Itamar: Can you please elaborate on the matter? Why/how does the number
of fields relevant here?

On Wednesday, December 10, 2014 4:26:16 PM UTC+1, Itamar Syn-Hershko
wrote:

Lucene / Elasticsearch is pretty much insignificant to this as long as
you use filters. You should prefer not_analyzed fields with string values
to represent those flags vs having dedicated boolean fields if you will
have more than a few such flags.

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Wed, Dec 10, 2014 at 10:22 AM, Dror Atariah dro...@gmail.com
wrote:

Assume that I want to be able to flag documents in an index according
to their attributes: isFoo and isBar [1]. As far as I understand, there are
two approaches:

Use dedicated fields for the flags: If the document is a Foo then
add a field named isFoo. Similarly, for isBar.

Use a flags field that will be an array of strings. In this case,
if the document is Foo then "flags" will contain the string "isFoo".

What are the pros and cons in terms of space and runtime complexities?

Bear in mind the following queries examples: Consider the case where
one wants to check the attributes of the documents in the index. In
particular, if I want to find the documents that are either Foo or Bar I
can either
(a) In case (1): Use a Boolean "should" filter the surrounds two
"exists"'s filters checking whether either isFoo or isBar exist.
(b) In case (2): Use a single "exists" filter that checks the
existence of the field "flags".

A different case, is if I want to find the documents that are both Foo
and Bar:
(a) In case (1): Like before, replace the "should" with a "must".
(b) In case (2): Surround two "term"s filters with a "must" Boolean
one.

Lastly, finding the documents that are Foo but not Bar.

In the bottom line, In case (1) all queries boil down to mixture of
Boolean, exists and missing filters. In case (2), one has to process the
strings in the array of strings named "flags". My intuition is that it is
faster to use method (1). In terms of space complexity I believe there is
no difference.

I'm looking forward to your insights!
Dror

[1]: Obviously, there could be way more flags...

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/ef637057-4303-4c75-9bbf-ed72e0d4806b%40goo
glegroups.com
https://groups.google.com/d/msgid/elasticsearch/ef637057-4303-4c75-9bbf-ed72e0d4806b%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/c376b40d-1c46-43f5-952f-96ec01338788%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/c376b40d-1c46-43f5-952f-96ec01338788%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a74d02d3-5065-4642-801e-a1823fab37a4%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a74d02d3-5065-4642-801e-a1823fab37a4%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZtAqUBWTh6xXLkOB2wOF80vN3U451GPS%3DGjzTkbTxDeBQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.