Facet manipulation/merging


(MJ Suhonos) #1

Hi all,

I suppose this is a sort of feature request, but I'm more interested
in whether it is possible: a) now; b) in a future version; or c)
through a workaround. There are two things I'd like to do -- they are
distinct but related:

Example document for both:

{
"doc": {
"title": "title 1",
"desc": "some words here"
},
"doc": {
"title": "title 2",
"desc": "some more words"
}
}

  1. Return the entire value of a field as a facet value instead of a
    list of terms. ie. for the above, a facet on doc.desc that would
    return "some words here", "some more words" instead of "some",
    "words", "here", "more" (current terms facet functionality).

I realize it's possible to achieve this by removing keyword
separators, eg. replacing " " with "_" ("some words here" ->
"some_words_here") or similar. But I'm wondering about a better
approach.

  1. Combine values of facets together into a single return value (like
    as is possible with ES scripting). ie. for the above, a terms facet
    would return "title", "some", "words", "1", "here", "2", "more".

I realize it's possible to achieve this by combining the results of
two (or more) separate term facets together in the application layer,
but again wondering whether there might be a better way with ES.

I suppose the short version of these questions is: what possibilities
(or intentions) are there for scripting/manipulation of facets beyond
the current statistical and histogram?

Of course, let me reiterate how incredible ES is and how grateful I am
for all the hard work and vision that Shay has put into it. You have
provided a new tool that has quite literally transformed the
possibilities in my field, and made things that were previously
considered impossible for decades almost trivial. :slight_smile:

Cheers,
MJ


(Shay Banon) #2

Hi, answers below:

On Mon, Oct 4, 2010 at 2:22 PM, MJ Suhonos suhonos@gmail.com wrote:

Hi all,

I suppose this is a sort of feature request, but I'm more interested
in whether it is possible: a) now; b) in a future version; or c)
through a workaround. There are two things I'd like to do -- they are
distinct but related:

Example document for both:

{
"doc": {
"title": "title 1",
"desc": "some words here"
},
"doc": {
"title": "title 2",
"desc": "some more words"
}
}

  1. Return the entire value of a field as a facet value instead of a
    list of terms. ie. for the above, a facet on doc.desc that would
    return "some words here", "some more words" instead of "some",
    "words", "here", "more" (current terms facet functionality).

I realize it's possible to achieve this by removing keyword
separators, eg. replacing " " with "_" ("some words here" ->
"some_words_here") or similar. But I'm wondering about a better
approach.

You can also specify the "title" field mapping with "index" :
"not_analyzed". Usually, for things like title, you might want to create a
multi_field mapping, one that is analyzed (for better search experience),
and one that is not analyzed (for things like facets).

  1. Combine values of facets together into a single return value (like
    as is possible with ES scripting). ie. for the above, a terms facet
    would return "title", "some", "words", "1", "here", "2", "more".

I realize it's possible to achieve this by combining the results of
two (or more) separate term facets together in the application layer,
but again wondering whether there might be a better way with ES.

Not sure I understand completely what you want to do here... . You can
provide custom scripts in certain facets, like term and histogram. What do
you mean by combine?

I suppose the short version of these questions is: what possibilities
(or intentions) are there for scripting/manipulation of facets beyond
the current statistical and histogram?

The general purpose is to allow for user to provide custom "scripts" that
can execute the whole faceting logic (ala custom map reduce). Common ones
that will emerge will get concrete implementations within elasticsearch for
better performance.

Of course, let me reiterate how incredible ES is and how grateful I am
for all the hard work and vision that Shay has put into it. You have
provided a new tool that has quite literally transformed the
possibilities in my field, and made things that were previously
considered impossible for decades almost trivial. :slight_smile:

Thanks!

Cheers,
MJ


(MJ Suhonos) #3

Hi Shay,

I realize it's possible to achieve this by removing keyword
separators, eg. replacing " " with "_" ("some words here" ->
"some_words_here") or similar. But I'm wondering about a better
approach.

You can also specify the "title" field mapping with "index" :
"not_analyzed". Usually, for things like title, you might want to create a
multi_field mapping, one that is analyzed (for better search experience),
and one that is not analyzed (for things like facets).

Ah, this is exactly the solution. It also answers some questions I
had in my head around analysis and multi_field. Thanks!

  1. Combine values of facets together into a single return value (like
    as is possible with ES scripting). ie. for the above, a terms facet
    would return "title", "some", "words", "1", "here", "2", "more".

I realize it's possible to achieve this by combining the results of
two (or more) separate term facets together in the application layer,
but again wondering whether there might be a better way with ES.

Not sure I understand completely what you want to do here... . You can
provide custom scripts in certain facets, like term and histogram. What do
you mean by combine?

Sorry, I had somehow overlooked the ability to use scripts in the term
facet; mea culpa. My use case is that I have several fields whose
values I would like to return in a single facet. eg. with:

{
"doc": {
"desc": "some words here",
"field" : "yet more text"
},
"doc": {
"desc": "some more words",
"field" : "even more words"
}
}

A facet for doc.desc would presumably yield: "some" (2), "words" (2),
"here" (1), "more" (1)
A facet for doc.field would yield: "more" (2), "yet" (1), "text" (1),
"even" (1), "words" (1)

What I require is something like: "words" (3), "more" (3), "some" (2),
"here" (1), "yet" (1), "text" (1), "even" (1)

The more I think about this, I suspect it's probably a rare enough use
case that doing it in the application layer is fine. But if it's easy
to implement in ES it would certainly be simpler/faster.

The general purpose is to allow for user to provide custom "scripts" that
can execute the whole faceting logic (ala custom map reduce). Common ones
that will emerge will get concrete implementations within elasticsearch for
better performance.

Certainly -- your responsiveness to issues and common use cases is
exemplary, especially considering your pace of implementation. One of
the major reasons ES is such a good fit for me is that I don't need
general map reduce, but the custom scripting, etc. are ideal for the
kind of (messy, poorly structured, inconsistent) data I'm working
with.

[A sub-note on this: thank you for the default local gateway in 0.11
-- I'm using ES on its own as a data store (no database), and this
makes my configuration, consistency, etc. SO much easier]

MJ


(Shay Banon) #4

Regarding a single term facet on several fields, thats certainly possible.
Open an issue for this?

On Mon, Oct 4, 2010 at 3:56 PM, MJ Suhonos suhonos@gmail.com wrote:

Hi Shay,

I realize it's possible to achieve this by removing keyword
separators, eg. replacing " " with "_" ("some words here" ->
"some_words_here") or similar. But I'm wondering about a better
approach.

You can also specify the "title" field mapping with "index" :
"not_analyzed". Usually, for things like title, you might want to create
a
multi_field mapping, one that is analyzed (for better search experience),
and one that is not analyzed (for things like facets).

Ah, this is exactly the solution. It also answers some questions I
had in my head around analysis and multi_field. Thanks!

  1. Combine values of facets together into a single return value (like
    as is possible with ES scripting). ie. for the above, a terms facet
    would return "title", "some", "words", "1", "here", "2", "more".

I realize it's possible to achieve this by combining the results of
two (or more) separate term facets together in the application layer,
but again wondering whether there might be a better way with ES.

Not sure I understand completely what you want to do here... . You can
provide custom scripts in certain facets, like term and histogram. What
do
you mean by combine?

Sorry, I had somehow overlooked the ability to use scripts in the term
facet; mea culpa. My use case is that I have several fields whose
values I would like to return in a single facet. eg. with:

{
"doc": {
"desc": "some words here",
"field" : "yet more text"
},
"doc": {
"desc": "some more words",
"field" : "even more words"
}
}

A facet for doc.desc would presumably yield: "some" (2), "words" (2),
"here" (1), "more" (1)
A facet for doc.field would yield: "more" (2), "yet" (1), "text" (1),
"even" (1), "words" (1)

What I require is something like: "words" (3), "more" (3), "some" (2),
"here" (1), "yet" (1), "text" (1), "even" (1)

The more I think about this, I suspect it's probably a rare enough use
case that doing it in the application layer is fine. But if it's easy
to implement in ES it would certainly be simpler/faster.

The general purpose is to allow for user to provide custom "scripts" that
can execute the whole faceting logic (ala custom map reduce). Common ones
that will emerge will get concrete implementations within elasticsearch
for
better performance.

Certainly -- your responsiveness to issues and common use cases is
exemplary, especially considering your pace of implementation. One of
the major reasons ES is such a good fit for me is that I don't need
general map reduce, but the custom scripting, etc. are ideal for the
kind of (messy, poorly structured, inconsistent) data I'm working
with.

[A sub-note on this: thank you for the default local gateway in 0.11
-- I'm using ES on its own as a data store (no database), and this
makes my configuration, consistency, etc. SO much easier]

MJ


(MJ Suhonos) #5

Absolutely. :slight_smile:

Thanks again, you're incredible.

On Oct 4, 10:07 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Regarding a single term facet on several fields, thats certainly possible.
Open an issue for this?

On Mon, Oct 4, 2010 at 3:56 PM, MJ Suhonos suho...@gmail.com wrote:

Hi Shay,

I realize it's possible to achieve this by removing keyword
separators, eg. replacing " " with "_" ("some words here" ->
"some_words_here") or similar. But I'm wondering about a better
approach.

You can also specify the "title" field mapping with "index" :
"not_analyzed". Usually, for things like title, you might want to create
a
multi_field mapping, one that is analyzed (for better search experience),
and one that is not analyzed (for things like facets).

Ah, this is exactly the solution. It also answers some questions I
had in my head around analysis and multi_field. Thanks!

  1. Combine values of facets together into a single return value (like
    as is possible with ES scripting). ie. for the above, a terms facet
    would return "title", "some", "words", "1", "here", "2", "more".

I realize it's possible to achieve this by combining the results of
two (or more) separate term facets together in the application layer,
but again wondering whether there might be a better way with ES.

Not sure I understand completely what you want to do here... . You can
provide custom scripts in certain facets, like term and histogram. What
do
you mean by combine?

Sorry, I had somehow overlooked the ability to use scripts in the term
facet; mea culpa. My use case is that I have several fields whose
values I would like to return in a single facet. eg. with:

{
"doc": {
"desc": "some words here",
"field" : "yet more text"
},
"doc": {
"desc": "some more words",
"field" : "even more words"
}
}

A facet for doc.desc would presumably yield: "some" (2), "words" (2),
"here" (1), "more" (1)
A facet for doc.field would yield: "more" (2), "yet" (1), "text" (1),
"even" (1), "words" (1)

What I require is something like: "words" (3), "more" (3), "some" (2),
"here" (1), "yet" (1), "text" (1), "even" (1)

The more I think about this, I suspect it's probably a rare enough use
case that doing it in the application layer is fine. But if it's easy
to implement in ES it would certainly be simpler/faster.

The general purpose is to allow for user to provide custom "scripts" that
can execute the whole faceting logic (ala custom map reduce). Common ones
that will emerge will get concrete implementations within elasticsearch
for
better performance.

Certainly -- your responsiveness to issues and common use cases is
exemplary, especially considering your pace of implementation. One of
the major reasons ES is such a good fit for me is that I don't need
general map reduce, but the custom scripting, etc. are ideal for the
kind of (messy, poorly structured, inconsistent) data I'm working
with.

[A sub-note on this: thank you for the default local gateway in 0.11
-- I'm using ES on its own as a data store (no database), and this
makes my configuration, consistency, etc. SO much easier]

MJ


(system) #6