Possible to create a single facet covering multiple fields?

datadev · October 12, 2011, 11:19am

Suppose I have the following index comprised of 3 documents:

foo:A, bar:X
foo:B, bar:X
foo:C, bar:Y

The use case requires that I retrieve a facet on foo with the
following counts:
A-X: 1
B-X: 1
C-Y:1

Basically, I need to know the value of bar for each foo within the
facet for foo. The analogy in SQL terms would be GROUP BY based on an
expression.

I also need to be able to sort based on foo or bar, so that's why they
are indexed as separate fields.

Is this possible in elasticsearch? One hack I've though about is to
store the value of bar both within the 'bar' field as well as a suffix
within the foo field (EG: foo:A-X, foo:B-X, etc...) to not require a
single facet to cover 2 fields, but that seems inefficient due to the
duplicate data?

kimchy · October 12, 2011, 9:17pm

Another option is to use scripting in the term facet, and combing the two
field values in the script.

On Wed, Oct 12, 2011 at 1:19 PM, datadev nji@adinfocenter.com wrote:

Suppose I have the following index comprised of 3 documents:

foo:A, bar:X
foo:B, bar:X
foo:C, bar:Y

The use case requires that I retrieve a facet on foo with the
following counts:
A-X: 1
B-X: 1
C-Y:1

Basically, I need to know the value of bar for each foo within the
facet for foo. The analogy in SQL terms would be GROUP BY based on an
expression.

I also need to be able to sort based on foo or bar, so that's why they
are indexed as separate fields.

Is this possible in elasticsearch? One hack I've though about is to
store the value of bar both within the 'bar' field as well as a suffix
within the foo field (EG: foo:A-X, foo:B-X, etc...) to not require a
single facet to cover 2 fields, but that seems inefficient due to the
duplicate data?

datadev · October 12, 2011, 11:27pm

From a performance standpoint, which option is better?
A. store duplicate info for bar within the foo field, but does not
have to run a script facet at runtime.
B. do not store duplicate info for bar, and run a script facet at
runtime (Elasticsearch Platform — Find real-time answers at scale | Elastic
facets/terms-facet.html).

On Oct 12, 5:17 pm, Shay Banon kim...@gmail.com wrote:

Another option is to use scripting in the term facet, and combing the two
field values in the script.

On Wed, Oct 12, 2011 at 1:19 PM, datadev n...@adinfocenter.com wrote:

Suppose I have the following index comprised of 3 documents:

foo:A, bar:X
foo:B, bar:X
foo:C, bar:Y

The use case requires that I retrieve a facet on foo with the
following counts:
A-X: 1
B-X: 1
C-Y:1

Basically, I need to know the value of bar for each foo within the
facet for foo. The analogy in SQL terms would be GROUP BY based on an
expression.

I also need to be able to sort based on foo or bar, so that's why they
are indexed as separate fields.

Is this possible in elasticsearch? One hack I've though about is to
store the value of bar both within the 'bar' field as well as a suffix
within the foo field (EG: foo:A-X, foo:B-X, etc...) to not require a
single facet to cover 2 fields, but that seems inefficient due to the
duplicate data?

kimchy · October 14, 2011, 11:59am

It will be faster to index the duplicate data.

On Thu, Oct 13, 2011 at 1:27 AM, datadev nji@adinfocenter.com wrote:

From a performance standpoint, which option is better?
A. store duplicate info for bar within the foo field, but does not
have to run a script facet at runtime.
B. do not store duplicate info for bar, and run a script facet at
runtime (Elasticsearch Platform — Find real-time answers at scale | Elastic
facets/terms-facet.html).

On Oct 12, 5:17 pm, Shay Banon kim...@gmail.com wrote:

Another option is to use scripting in the term facet, and combing the two
field values in the script.

On Wed, Oct 12, 2011 at 1:19 PM, datadev n...@adinfocenter.com wrote:

Suppose I have the following index comprised of 3 documents:

foo:A, bar:X
foo:B, bar:X
foo:C, bar:Y

The use case requires that I retrieve a facet on foo with the
following counts:
A-X: 1
B-X: 1
C-Y:1

Basically, I need to know the value of bar for each foo within the
facet for foo. The analogy in SQL terms would be GROUP BY based on an
expression.

I also need to be able to sort based on foo or bar, so that's why they
are indexed as separate fields.

Is this possible in elasticsearch? One hack I've though about is to
store the value of bar both within the 'bar' field as well as a suffix
within the foo field (EG: foo:A-X, foo:B-X, etc...) to not require a
single facet to cover 2 fields, but that seems inefficient due to the
duplicate data?

datadev · October 14, 2011, 2:04pm

Hi Shay,

Thanks for the feedback. One final question regarding the relative
performance:

While it is faster to index the duplicate data instead of having to
run a script execution as you mentioned, option A (the duplicate data
approach) would mean that queries on the foo field would now have to
be 'A-X' instead of just 'A'. However, in many situations, only A is
known and X is not known. Therefore, the duplicate data approach would
force us to query the foo field using 'A*' (or a prefix query) as
opposed to foo = 'A'. Assuming that foo is a non-analyzed field and
that 50% of queries will require faceting and 50% of queries will
query foo = '...', which approach (index duplicate data or not) is
faster? The other consideration is the duplicate data approach also
creates greater storage size and indexing overhead.

On Oct 14, 7:59 am, Shay Banon kim...@gmail.com wrote:

It will be faster to index the duplicate data.

On Thu, Oct 13, 2011 at 1:27 AM, datadev n...@adinfocenter.com wrote:

From a performance standpoint, which option is better?
A. store duplicate info for bar within the foo field, but does not
have to run a script facet at runtime.
B. do not store duplicate info for bar, and run a script facet at
runtime (Elasticsearch Platform — Find real-time answers at scale | Elastic
facets/terms-facet.html).

On Oct 12, 5:17 pm, Shay Banon kim...@gmail.com wrote:

Another option is to use scripting in the term facet, and combing the two
field values in the script.

On Wed, Oct 12, 2011 at 1:19 PM, datadev n...@adinfocenter.com wrote:

Suppose I have the following index comprised of 3 documents:

foo:A, bar:X
foo:B, bar:X
foo:C, bar:Y

The use case requires that I retrieve a facet on foo with the
following counts:
A-X: 1
B-X: 1
C-Y:1

Basically, I need to know the value of bar for each foo within the
facet for foo. The analogy in SQL terms would be GROUP BY based on an
expression.

I also need to be able to sort based on foo or bar, so that's why they
are indexed as separate fields.

Is this possible in elasticsearch? One hack I've though about is to
store the value of bar both within the 'bar' field as well as a suffix
within the foo field (EG: foo:A-X, foo:B-X, etc...) to not require a
single facet to cover 2 fields, but that seems inefficient due to the
duplicate data?

kimchy · October 14, 2011, 2:07pm

Sorry, when I meant duplicate data, is to have a new field called foo_bar,
and index there the combination of foo and bar fields. This will mean more
indexing, and more memory usage, but will be faster when faceting on it.

On Fri, Oct 14, 2011 at 4:04 PM, datadev nji@adinfocenter.com wrote:

Hi Shay,

Thanks for the feedback. One final question regarding the relative
performance:

While it is faster to index the duplicate data instead of having to
run a script execution as you mentioned, option A (the duplicate data
approach) would mean that queries on the foo field would now have to
be 'A-X' instead of just 'A'. However, in many situations, only A is
known and X is not known. Therefore, the duplicate data approach would
force us to query the foo field using 'A*' (or a prefix query) as
opposed to foo = 'A'. Assuming that foo is a non-analyzed field and
that 50% of queries will require faceting and 50% of queries will
query foo = '...', which approach (index duplicate data or not) is
faster? The other consideration is the duplicate data approach also
creates greater storage size and indexing overhead.

On Oct 14, 7:59 am, Shay Banon kim...@gmail.com wrote:

It will be faster to index the duplicate data.

On Thu, Oct 13, 2011 at 1:27 AM, datadev n...@adinfocenter.com wrote:

From a performance standpoint, which option is better?
A. store duplicate info for bar within the foo field, but does not
have to run a script facet at runtime.
B. do not store duplicate info for bar, and run a script facet at
runtime (Elasticsearch Platform — Find real-time answers at scale | Elastic
facets/terms-facet.html).

On Oct 12, 5:17 pm, Shay Banon kim...@gmail.com wrote:

Another option is to use scripting in the term facet, and combing the
two
field values in the script.

On Wed, Oct 12, 2011 at 1:19 PM, datadev n...@adinfocenter.com
wrote:

Suppose I have the following index comprised of 3 documents:

foo:A, bar:X
foo:B, bar:X
foo:C, bar:Y

The use case requires that I retrieve a facet on foo with the
following counts:
A-X: 1
B-X: 1
C-Y:1

Basically, I need to know the value of bar for each foo within the
facet for foo. The analogy in SQL terms would be GROUP BY based on
an
expression.

I also need to be able to sort based on foo or bar, so that's why
they
are indexed as separate fields.

Is this possible in elasticsearch? One hack I've though about is to
store the value of bar both within the 'bar' field as well as a
suffix
within the foo field (EG: foo:A-X, foo:B-X, etc...) to not require
a
single facet to cover 2 fields, but that seems inefficient due to
the
duplicate data?

Topic		Replies	Views
Aggregate multiple value fields separately in terms_stats facet? Elasticsearch	8	410	July 6, 2017
TermsFacet - Separate Facets by Referance to the Indexfield? Elasticsearch	2	308	July 6, 2017
Possible to create a facet covering multiple array fields Elasticsearch	1	314	July 6, 2017
Terms stats facet using script on multiple fields Elasticsearch	2	354	July 6, 2017
Facet with multiple counts Elasticsearch	9	526	July 6, 2017

Possible to create a single facet covering multiple fields?

Related topics