Pivot facets

Hi all

Does ES currently support, or plan to support pivot faceting like
Solr? I am trying to create a hierarchical facet structure, for
example:

State

  • California (23)
    • Los Angeles (9)
    • San Francisco (14)
  • Florida (5)
    • Miami (5)

Many thanks
Greg

Yes, there is plan to support it, but note that the complexity of the solution (while still maintaining good search performance). Also note that hierarchical and pivot talbe are two completely different things.

Note, that some facet do allow for "two" level aggregation, such as terms stats, and histograms (can have custom value field).

The simplest is an hierarchical terms facet. But, of course, what happens when you want histo -> terms - > histo?

And last, there is a the pivot table, which is very different from hierarchical facets. Many people confuse between teh two.
On Tuesday, May 24, 2011 at 11:34 PM, Greg wrote:
Hi all

Does ES currently support, or plan to support pivot faceting like
Solr? I am trying to create a hierarchical facet structure, for
example:

State

  • California (23)
  • Los Angeles (9)
  • San Francisco (14)
  • Florida (5)
  • Miami (5)

Many thanks
Greg

Hi Shay

Could you briefly clarify the difference between a hierarchical facet
and pivot facet? I had been reading the following article, which led
me to believe they are the same thing:
http://solr.pl/en/2010/10/25/hierarchical-faceting-pivot-facets-in-trunk/

Also, you mentioned that it is possible to do two-level aggregation of
terms facets. I have checked the docs, but cannot find how to do this.
What is the syntax for this?

Many thanks
Greg

On Wednesday, May 25, 2011 at 1:55 AM, Greg wrote:
Hi Shay

Could you briefly clarify the difference between a hierarchical facet
and pivot facet? I had been reading the following article, which led
me to believe they are the same thing:
Hierarchical faceting – Pivot facets in trunk – Solr.pl
Pivot table - Wikipedia, thats completely different from hierarchy one, or even what some call pivot.

Also, you mentioned that it is possible to do two-level aggregation of
terms facets. I have checked the docs, but cannot find how to do this.
What is the syntax for this?
Terms Stats facet, but thats only support where the value is numeric ( Elasticsearch Platform — Find real-time answers at scale | Elastic).

Many thanks
Greg

Terms Stats facet, but thats only support where the value is numeric (
Elasticsearch Platform — Find real-time answers at scale | Elastic
).

I was wondering whether the term_stats facet could support string value
fields? It currently doesn't (as stated above), so I'm wondering if this is
because of the distributed nature of ES or simply a TODO that hasn't been
TODOed.

Thanks,
Philippe

What kinds of statistical facets could you return besides count (which you can get with a simple terms facet)? The other statistical facets (total, average, min, max, etc.) don't really make sense for string values. Do your string values contain numbers within them (as substrings)? You could write a script that pulls them out and converts the substrings to integers.

I am working on a native Java script that can apply regex patterns to search results, and can be used in facet calculations. Is this something that would be interesting to anyone else? Is there an Issue for this?

--Mark

On Jan 17, 2012, at 12:20 PM, Philippe Laflamme wrote:

Terms Stats facet, but thats only support where the value is numeric ( Elasticsearch Platform — Find real-time answers at scale | Elastic).

I was wondering whether the term_stats facet could support string value fields? It currently doesn't (as stated above), so I'm wondering if this is because of the distributed nature of ES or simply a TODO that hasn't been TODOed.

Thanks,
Philippe

When the "value_field" is a string, it would return frequencies for each
distinct value. So you'd obtain the frequency of the terms for each of the
values of "key_field".

key_field_a :
value_field_a : freq
value_field_b: freq
key_field_b:
value_field_a : freq
value_field_b: freq

It's a cross-tabulation : Contingency table - Wikipedia

It's been referred to as "hierarchical facet" and "pivot facet".

Philippe

On Tue, Jan 17, 2012 at 16:09, Mark Huang mark.l.huang@gmail.com wrote:

What kinds of statistical facets could you return besides count (which you
can get with a simple terms facet)? The other statistical facets (total,
average, min, max, etc.) don't really make sense for string values. Do your
string values contain numbers within them (as substrings)? You could write
a script that pulls them out and converts the substrings to integers.

I am working on a native Java script that can apply regex patterns to
search results, and can be used in facet calculations. Is this something
that would be interesting to anyone else? Is there an Issue for this?

--Mark

On Jan 17, 2012, at 12:20 PM, Philippe Laflamme wrote:

Terms Stats facet, but thats only support where the value is numeric (
Elasticsearch Platform — Find real-time answers at scale | Elastic
).

I was wondering whether the term_stats facet could support string value
fields? It currently doesn't (as stated above), so I'm wondering if this is
because of the distributed nature of ES or simply a TODO that hasn't been
TODOed.

Thanks,
Philippe

You can derive frequency from terms facet count. If the value string of interest is the entire field, just specify the field name (must be not_analyzed) to facet on. If it is an analyzed field, or you need to pull out a substring, you need to use script_field or script to perform the necessary textual transformation, which can be very slow, but it will basically work:

As of now, you can only get top N terms and their counts with a terms facet; unfortunately "all terms" doesn't exist yet:

http://elasticsearch-users.115913.n3.nabble.com/Terms-facet-all-terms-does-not-work-td3568708.html

--Mark

On Jan 17, 2012, at 1:55 PM, Philippe Laflamme wrote:

When the "value_field" is a string, it would return frequencies for each distinct value. So you'd obtain the frequency of the terms for each of the values of "key_field".

key_field_a :
value_field_a : freq
value_field_b: freq
key_field_b:
value_field_a : freq
value_field_b: freq

It's a cross-tabulation : Contingency table - Wikipedia

It's been referred to as "hierarchical facet" and "pivot facet".

Philippe

On Tue, Jan 17, 2012 at 16:09, Mark Huang mark.l.huang@gmail.com wrote:
What kinds of statistical facets could you return besides count (which you can get with a simple terms facet)? The other statistical facets (total, average, min, max, etc.) don't really make sense for string values. Do your string values contain numbers within them (as substrings)? You could write a script that pulls them out and converts the substrings to integers.

I am working on a native Java script that can apply regex patterns to search results, and can be used in facet calculations. Is this something that would be interesting to anyone else? Is there an Issue for this?

--Mark

On Jan 17, 2012, at 12:20 PM, Philippe Laflamme wrote:

Terms Stats facet, but thats only support where the value is numeric ( Elasticsearch Platform — Find real-time answers at scale | Elastic).

I was wondering whether the term_stats facet could support string value fields? It currently doesn't (as stated above), so I'm wondering if this is because of the distributed nature of ES or simply a TODO that hasn't been TODOed.

Thanks,
Philippe

I'm not sure I understand what you're saying. How do you suggest making a
cross-tabulation using terms_facet.

I don't think you can derive the frequencies of a cross-tabulation without
making N queries (where N is the number of columns, or rows in your table).
Or maybe a single query with N filtered terms_facet (which probably results
in N queries anyway).

Are you suggesting otherwise?

Thanks
Philippe

On Tue, Jan 17, 2012 at 17:38, Mark Huang mark.l.huang@gmail.com wrote:

You can derive frequency from terms facet count. If the value string of
interest is the entire field, just specify the field name (must be
not_analyzed) to facet on. If it is an analyzed field, or you need to pull
out a substring, you need to use script_field or script to perform the
necessary textual transformation, which can be very slow, but it will
basically work:

Elasticsearch Platform — Find real-time answers at scale | Elastic

As of now, you can only get top N terms and their counts with a terms
facet; unfortunately "all terms" doesn't exist yet:

all_terms capability of terms facet · Issue #1530 · elastic/elasticsearch · GitHub

http://elasticsearch-users.115913.n3.nabble.com/Terms-facet-all-terms-does-not-work-td3568708.html

--Mark

On Jan 17, 2012, at 1:55 PM, Philippe Laflamme wrote:

When the "value_field" is a string, it would return frequencies for each
distinct value. So you'd obtain the frequency of the terms for each of the
values of "key_field".

key_field_a :
value_field_a : freq
value_field_b: freq
key_field_b:
value_field_a : freq
value_field_b: freq

It's a cross-tabulation : Contingency table - Wikipedia

It's been referred to as "hierarchical facet" and "pivot facet".

Philippe

On Tue, Jan 17, 2012 at 16:09, Mark Huang mark.l.huang@gmail.com wrote:

What kinds of statistical facets could you return besides count (which
you can get with a simple terms facet)? The other statistical facets
(total, average, min, max, etc.) don't really make sense for string values.
Do your string values contain numbers within them (as substrings)? You
could write a script that pulls them out and converts the substrings to
integers.

I am working on a native Java script that can apply regex patterns to
search results, and can be used in facet calculations. Is this something
that would be interesting to anyone else? Is there an Issue for this?

--Mark

On Jan 17, 2012, at 12:20 PM, Philippe Laflamme wrote:

Terms Stats facet, but thats only support where the value is numeric (
Elasticsearch Platform — Find real-time answers at scale | Elastic
).

I was wondering whether the term_stats facet could support string value
fields? It currently doesn't (as stated above), so I'm wondering if this is
because of the distributed nature of ES or simply a TODO that hasn't been
TODOed.

Thanks,
Philippe

Sorry, I'm just reading the thread from last year now. If it existed, I think a hierarchical terms facet is what you are looking for. The only two-level aggregate functions available now are terms stats and histogram, and the second level aggregate function for both only calculates numerical statistics, not term frequencies.

--Mark

On Jan 17, 2012, at 5:00 PM, Philippe Laflamme wrote:

I'm not sure I understand what you're saying. How do you suggest making a cross-tabulation using terms_facet.

I don't think you can derive the frequencies of a cross-tabulation without making N queries (where N is the number of columns, or rows in your table). Or maybe a single query with N filtered terms_facet (which probably results in N queries anyway).

Are you suggesting otherwise?

Thanks
Philippe

Yes indeed. And I'm wondering why the term_stats facet only supports
numerical fields as the value_field. Is this simply because it hasn't been
done or it's because of the nature of ES which makes it impossible
(distributed distinct count).

Philippe

On Wed, Jan 18, 2012 at 03:11, Mark Huang mark.l.huang@gmail.com wrote:

Sorry, I'm just reading the thread from last year now. If it existed, I
think a hierarchical terms facet is what you are looking for. The only
two-level aggregate functions available now are terms stats and histogram,
and the second level aggregate function for both only calculates numerical
statistics, not term frequencies.

--Mark

On Jan 17, 2012, at 5:00 PM, Philippe Laflamme wrote:

I'm not sure I understand what you're saying. How do you suggest making
a cross-tabulation using terms_facet.

I don't think you can derive the frequencies of a cross-tabulation
without making N queries (where N is the number of columns, or rows in your
table). Or maybe a single query with N filtered terms_facet (which probably
results in N queries anyway).

Are you suggesting otherwise?

Thanks
Philippe

Shay can correct me if I'm wrong, but I doubt it's because of the nature of ES. As you said, there's no fundamental reason it couldn't be supported; ES could make the same queries you could to create your attribute hierarchy; but the practical issues, such as how to optimize the queries, and how to limit their memory consumption (both in a generalized fashion), are quite difficult to think about.

If you are trying to create a pivot table in the MS Excel sense of the word, however, I'm not sure you need hierarchical facets, you just need the documents with your AxB different attributes, and can create the table yourself in your application. In the Wikipedia example, you could synthesize a script field called Gender_Handedness that is defined as "doc.gender + '_' + doc.handedness", then get counts for each combination of "Male_Left-handed", "Female_Left-handed", etc., and create the pivot table in your app.

--Mark

On Jan 18, 2012, at 5:42 AM, Philippe Laflamme wrote:

Yes indeed. And I'm wondering why the term_stats facet only supports numerical fields as the value_field. Is this simply because it hasn't been done or it's because of the nature of ES which makes it impossible (distributed distinct count).

Philippe