Possible to create a single facet covering multiple fields?


(datadev) #1

Suppose I have the following index comprised of 3 documents:

foo:A, bar:X
foo:B, bar:X
foo:C, bar:Y

The use case requires that I retrieve a facet on foo with the
following counts:
A-X: 1
B-X: 1
C-Y:1

Basically, I need to know the value of bar for each foo within the
facet for foo. The analogy in SQL terms would be GROUP BY based on an
expression.

I also need to be able to sort based on foo or bar, so that's why they
are indexed as separate fields.

Is this possible in elasticsearch? One hack I've though about is to
store the value of bar both within the 'bar' field as well as a suffix
within the foo field (EG: foo:A-X, foo:B-X, etc...) to not require a
single facet to cover 2 fields, but that seems inefficient due to the
duplicate data?


(Shay Banon) #2

Another option is to use scripting in the term facet, and combing the two
field values in the script.

On Wed, Oct 12, 2011 at 1:19 PM, datadev nji@adinfocenter.com wrote:

Suppose I have the following index comprised of 3 documents:

foo:A, bar:X
foo:B, bar:X
foo:C, bar:Y

The use case requires that I retrieve a facet on foo with the
following counts:
A-X: 1
B-X: 1
C-Y:1

Basically, I need to know the value of bar for each foo within the
facet for foo. The analogy in SQL terms would be GROUP BY based on an
expression.

I also need to be able to sort based on foo or bar, so that's why they
are indexed as separate fields.

Is this possible in elasticsearch? One hack I've though about is to
store the value of bar both within the 'bar' field as well as a suffix
within the foo field (EG: foo:A-X, foo:B-X, etc...) to not require a
single facet to cover 2 fields, but that seems inefficient due to the
duplicate data?


(datadev) #3

From a performance standpoint, which option is better?
A. store duplicate info for bar within the foo field, but does not
have to run a script facet at runtime.
B. do not store duplicate info for bar, and run a script facet at
runtime (http://www.elasticsearch.org/guide/reference/api/search/
facets/terms-facet.html).

On Oct 12, 5:17 pm, Shay Banon kim...@gmail.com wrote:

Another option is to use scripting in the term facet, and combing the two
field values in the script.

On Wed, Oct 12, 2011 at 1:19 PM, datadev n...@adinfocenter.com wrote:

Suppose I have the following index comprised of 3 documents:

foo:A, bar:X
foo:B, bar:X
foo:C, bar:Y

The use case requires that I retrieve a facet on foo with the
following counts:
A-X: 1
B-X: 1
C-Y:1

Basically, I need to know the value of bar for each foo within the
facet for foo. The analogy in SQL terms would be GROUP BY based on an
expression.

I also need to be able to sort based on foo or bar, so that's why they
are indexed as separate fields.

Is this possible in elasticsearch? One hack I've though about is to
store the value of bar both within the 'bar' field as well as a suffix
within the foo field (EG: foo:A-X, foo:B-X, etc...) to not require a
single facet to cover 2 fields, but that seems inefficient due to the
duplicate data?


(Shay Banon) #4

It will be faster to index the duplicate data.

On Thu, Oct 13, 2011 at 1:27 AM, datadev nji@adinfocenter.com wrote:

From a performance standpoint, which option is better?
A. store duplicate info for bar within the foo field, but does not
have to run a script facet at runtime.
B. do not store duplicate info for bar, and run a script facet at
runtime (http://www.elasticsearch.org/guide/reference/api/search/
facets/terms-facet.html).

On Oct 12, 5:17 pm, Shay Banon kim...@gmail.com wrote:

Another option is to use scripting in the term facet, and combing the two
field values in the script.

On Wed, Oct 12, 2011 at 1:19 PM, datadev n...@adinfocenter.com wrote:

Suppose I have the following index comprised of 3 documents:

foo:A, bar:X
foo:B, bar:X
foo:C, bar:Y

The use case requires that I retrieve a facet on foo with the
following counts:
A-X: 1
B-X: 1
C-Y:1

Basically, I need to know the value of bar for each foo within the
facet for foo. The analogy in SQL terms would be GROUP BY based on an
expression.

I also need to be able to sort based on foo or bar, so that's why they
are indexed as separate fields.

Is this possible in elasticsearch? One hack I've though about is to
store the value of bar both within the 'bar' field as well as a suffix
within the foo field (EG: foo:A-X, foo:B-X, etc...) to not require a
single facet to cover 2 fields, but that seems inefficient due to the
duplicate data?


(datadev) #5

Hi Shay,

Thanks for the feedback. One final question regarding the relative
performance:

While it is faster to index the duplicate data instead of having to
run a script execution as you mentioned, option A (the duplicate data
approach) would mean that queries on the foo field would now have to
be 'A-X' instead of just 'A'. However, in many situations, only A is
known and X is not known. Therefore, the duplicate data approach would
force us to query the foo field using 'A*' (or a prefix query) as
opposed to foo = 'A'. Assuming that foo is a non-analyzed field and
that 50% of queries will require faceting and 50% of queries will
query foo = '...', which approach (index duplicate data or not) is
faster? The other consideration is the duplicate data approach also
creates greater storage size and indexing overhead.

On Oct 14, 7:59 am, Shay Banon kim...@gmail.com wrote:

It will be faster to index the duplicate data.

On Thu, Oct 13, 2011 at 1:27 AM, datadev n...@adinfocenter.com wrote:

From a performance standpoint, which option is better?
A. store duplicate info for bar within the foo field, but does not
have to run a script facet at runtime.
B. do not store duplicate info for bar, and run a script facet at
runtime (http://www.elasticsearch.org/guide/reference/api/search/
facets/terms-facet.html).

On Oct 12, 5:17 pm, Shay Banon kim...@gmail.com wrote:

Another option is to use scripting in the term facet, and combing the two
field values in the script.

On Wed, Oct 12, 2011 at 1:19 PM, datadev n...@adinfocenter.com wrote:

Suppose I have the following index comprised of 3 documents:

foo:A, bar:X
foo:B, bar:X
foo:C, bar:Y

The use case requires that I retrieve a facet on foo with the
following counts:
A-X: 1
B-X: 1
C-Y:1

Basically, I need to know the value of bar for each foo within the
facet for foo. The analogy in SQL terms would be GROUP BY based on an
expression.

I also need to be able to sort based on foo or bar, so that's why they
are indexed as separate fields.

Is this possible in elasticsearch? One hack I've though about is to
store the value of bar both within the 'bar' field as well as a suffix
within the foo field (EG: foo:A-X, foo:B-X, etc...) to not require a
single facet to cover 2 fields, but that seems inefficient due to the
duplicate data?


(Shay Banon) #6

Sorry, when I meant duplicate data, is to have a new field called foo_bar,
and index there the combination of foo and bar fields. This will mean more
indexing, and more memory usage, but will be faster when faceting on it.

On Fri, Oct 14, 2011 at 4:04 PM, datadev nji@adinfocenter.com wrote:

Hi Shay,

Thanks for the feedback. One final question regarding the relative
performance:

While it is faster to index the duplicate data instead of having to
run a script execution as you mentioned, option A (the duplicate data
approach) would mean that queries on the foo field would now have to
be 'A-X' instead of just 'A'. However, in many situations, only A is
known and X is not known. Therefore, the duplicate data approach would
force us to query the foo field using 'A*' (or a prefix query) as
opposed to foo = 'A'. Assuming that foo is a non-analyzed field and
that 50% of queries will require faceting and 50% of queries will
query foo = '...', which approach (index duplicate data or not) is
faster? The other consideration is the duplicate data approach also
creates greater storage size and indexing overhead.

On Oct 14, 7:59 am, Shay Banon kim...@gmail.com wrote:

It will be faster to index the duplicate data.

On Thu, Oct 13, 2011 at 1:27 AM, datadev n...@adinfocenter.com wrote:

From a performance standpoint, which option is better?
A. store duplicate info for bar within the foo field, but does not
have to run a script facet at runtime.
B. do not store duplicate info for bar, and run a script facet at
runtime (http://www.elasticsearch.org/guide/reference/api/search/
facets/terms-facet.html).

On Oct 12, 5:17 pm, Shay Banon kim...@gmail.com wrote:

Another option is to use scripting in the term facet, and combing the
two

field values in the script.

On Wed, Oct 12, 2011 at 1:19 PM, datadev n...@adinfocenter.com
wrote:

Suppose I have the following index comprised of 3 documents:

foo:A, bar:X
foo:B, bar:X
foo:C, bar:Y

The use case requires that I retrieve a facet on foo with the
following counts:
A-X: 1
B-X: 1
C-Y:1

Basically, I need to know the value of bar for each foo within the
facet for foo. The analogy in SQL terms would be GROUP BY based on
an

expression.

I also need to be able to sort based on foo or bar, so that's why
they

are indexed as separate fields.

Is this possible in elasticsearch? One hack I've though about is to
store the value of bar both within the 'bar' field as well as a
suffix

within the foo field (EG: foo:A-X, foo:B-X, etc...) to not require
a

single facet to cover 2 fields, but that seems inefficient due to
the

duplicate data?


(system) #7