N:m lookup filter


(Don Clore) #1

I am pretty sure this is not supported, but it'd be great to explicit
confirmation/denial.

So....document types A and B, where there's an N:M relationship between A
and B, and document type B has a list of the document A instances that
relate to it.

More concretely A == a sports Player data type, and B is a set of new
stories. The Story type has a list of the ids of Players that the story
is about/related to.

So....I know the terms lookup filter allows one to use a single document as
the source of the terms for the lookup. What we'd like to be able to do
is expose a faceted/aggregations-based UI to the user that allows her to
perform a variety of filtering operations on Players over a fairly
extensive set of criteria, and then have the resulting set of Player
document ids serve as the lookup into the Story stories, i.e., get all the
stories that relate to the Player result set.

Obviously, we'd ideally like to do this in a single query, or failing that,
have some reasonably efficient way to issue the two query/filters (passing
a large result set of ids over the wire seems like a bad idea; I'm new to
ES, but...this kind of thing was never great with Solr).

One idea I had (perhaps half-baked) was to create a PlayerResultSet type,
with an id deterministically fashioned from the query/filter predicates
such that the same user filtering action would result in the same
PlayerResultSet id each time; we'd issue a terms lookup filter request
using the PlayerResultSet id, if it fails because the PlayerResultSet
document doesn't exist, then we'd have to issue the filter for the Players,
construct a PlayerResultSet doc and index it, and query for the Stories
that have those Player Ids; not sure if it would be worse to issue all the
ids in a query, or index the PlayerResultSet doc with Refresh==true (or
issue the query and queue up the PlayerResultSet doc for later indexing, or
whatever).

The Player data should be fairly static; we could delete the documents and
recreate them each time we refresh Player data.

Ok, that sounds pretty awful, I'm hoping someone has a less Rube-Goldberg
approach; obviously, I'm sort of building in my filter query caching
mechanism, hopefully something like this can be more easily achieved with
the built-in filter caching.

thanks for any insights,
Don

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/91919a48-0892-4878-890b-e14c67fd40b5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Jörg Prante) #2

Yes, I think this is somehow related to Matt's Join Filter

Jörg

On Sat, Jul 19, 2014 at 4:24 AM, Don Clore cloredon42@gmail.com wrote:

I am pretty sure this is not supported, but it'd be great to explicit
confirmation/denial.

So....document types A and B, where there's an N:M relationship between A
and B, and document type B has a list of the document A instances that
relate to it.

More concretely A == a sports Player data type, and B is a set of new
stories. The Story type has a list of the ids of Players that the story
is about/related to.

So....I know the terms lookup filter allows one to use a single document
as the source of the terms for the lookup. What we'd like to be able to
do is expose a faceted/aggregations-based UI to the user that allows her to
perform a variety of filtering operations on Players over a fairly
extensive set of criteria, and then have the resulting set of Player
document ids serve as the lookup into the Story stories, i.e., get all the
stories that relate to the Player result set.

Obviously, we'd ideally like to do this in a single query, or failing
that, have some reasonably efficient way to issue the two query/filters
(passing a large result set of ids over the wire seems like a bad idea; I'm
new to ES, but...this kind of thing was never great with Solr).

One idea I had (perhaps half-baked) was to create a PlayerResultSet type,
with an id deterministically fashioned from the query/filter predicates
such that the same user filtering action would result in the same
PlayerResultSet id each time; we'd issue a terms lookup filter request
using the PlayerResultSet id, if it fails because the PlayerResultSet
document doesn't exist, then we'd have to issue the filter for the Players,
construct a PlayerResultSet doc and index it, and query for the Stories
that have those Player Ids; not sure if it would be worse to issue all the
ids in a query, or index the PlayerResultSet doc with Refresh==true (or
issue the query and queue up the PlayerResultSet doc for later indexing, or
whatever).

The Player data should be fairly static; we could delete the documents and
recreate them each time we refresh Player data.

Ok, that sounds pretty awful, I'm hoping someone has a less Rube-Goldberg
approach; obviously, I'm sort of building in my filter query caching
mechanism, hopefully something like this can be more easily achieved with
the built-in filter caching.

thanks for any insights,
Don

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/91919a48-0892-4878-890b-e14c67fd40b5%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/91919a48-0892-4878-890b-e14c67fd40b5%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEMzKNuuBvuTt5XTLN6gMuePrVDP-%3DyjyQ0pWnPJ5NK9w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Don Clore) #3

Does anyone know the status of that pull request? Is it likely to be
approved?

thanks,
Don

On Saturday, July 19, 2014 12:14:01 AM UTC-7, Jörg Prante wrote:

Yes, I think this is somehow related to Matt's Join Filter

https://github.com/elasticsearch/elasticsearch/pull/3278

Jörg

On Sat, Jul 19, 2014 at 4:24 AM, Don Clore <clore...@gmail.com
<javascript:>> wrote:

I am pretty sure this is not supported, but it'd be great to explicit
confirmation/denial.

So....document types A and B, where there's an N:M relationship between A
and B, and document type B has a list of the document A instances that
relate to it.

More concretely A == a sports Player data type, and B is a set of new
stories. The Story type has a list of the ids of Players that the story
is about/related to.

So....I know the terms lookup filter allows one to use a single document
as the source of the terms for the lookup. What we'd like to be able to
do is expose a faceted/aggregations-based UI to the user that allows her to
perform a variety of filtering operations on Players over a fairly
extensive set of criteria, and then have the resulting set of Player
document ids serve as the lookup into the Story stories, i.e., get all the
stories that relate to the Player result set.

Obviously, we'd ideally like to do this in a single query, or failing
that, have some reasonably efficient way to issue the two query/filters
(passing a large result set of ids over the wire seems like a bad idea; I'm
new to ES, but...this kind of thing was never great with Solr).

One idea I had (perhaps half-baked) was to create a PlayerResultSet type,
with an id deterministically fashioned from the query/filter predicates
such that the same user filtering action would result in the same
PlayerResultSet id each time; we'd issue a terms lookup filter request
using the PlayerResultSet id, if it fails because the PlayerResultSet
document doesn't exist, then we'd have to issue the filter for the Players,
construct a PlayerResultSet doc and index it, and query for the Stories
that have those Player Ids; not sure if it would be worse to issue all the
ids in a query, or index the PlayerResultSet doc with Refresh==true (or
issue the query and queue up the PlayerResultSet doc for later indexing, or
whatever).

The Player data should be fairly static; we could delete the documents
and recreate them each time we refresh Player data.

Ok, that sounds pretty awful, I'm hoping someone has a less Rube-Goldberg
approach; obviously, I'm sort of building in my filter query caching
mechanism, hopefully something like this can be more easily achieved with
the built-in filter caching.

thanks for any insights,
Don

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/91919a48-0892-4878-890b-e14c67fd40b5%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/91919a48-0892-4878-890b-e14c67fd40b5%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/22ef7166-a15a-430b-b0e2-3c99285fa380%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Matt Weber) #4

It's currently blocked until we can figure out a way to prevent a bad query
from triggering an OOM error. The goal (as far as I've been told) is to
get this in, but no ETA. I need to update the PR to the latest master as
there have been significant changes as well.

Thanks,
Matt Weber
On Jul 25, 2014 8:52 PM, "Don Clore" cloredon42@gmail.com wrote:

Does anyone know the status of that pull request? Is it likely to be
approved?

thanks,
Don

On Saturday, July 19, 2014 12:14:01 AM UTC-7, Jörg Prante wrote:

Yes, I think this is somehow related to Matt's Join Filter

https://github.com/elasticsearch/elasticsearch/pull/3278

Jörg

On Sat, Jul 19, 2014 at 4:24 AM, Don Clore clore...@gmail.com wrote:

I am pretty sure this is not supported, but it'd be great to explicit
confirmation/denial.

So....document types A and B, where there's an N:M relationship between
A and B, and document type B has a list of the document A instances that
relate to it.

More concretely A == a sports Player data type, and B is a set of new
stories. The Story type has a list of the ids of Players that the story
is about/related to.

So....I know the terms lookup filter allows one to use a single document
as the source of the terms for the lookup. What we'd like to be able to
do is expose a faceted/aggregations-based UI to the user that allows her to
perform a variety of filtering operations on Players over a fairly
extensive set of criteria, and then have the resulting set of Player
document ids serve as the lookup into the Story stories, i.e., get all the
stories that relate to the Player result set.

Obviously, we'd ideally like to do this in a single query, or failing
that, have some reasonably efficient way to issue the two query/filters
(passing a large result set of ids over the wire seems like a bad idea; I'm
new to ES, but...this kind of thing was never great with Solr).

One idea I had (perhaps half-baked) was to create a PlayerResultSet
type, with an id deterministically fashioned from the query/filter
predicates such that the same user filtering action would result in the
same PlayerResultSet id each time; we'd issue a terms lookup filter request
using the PlayerResultSet id, if it fails because the PlayerResultSet
document doesn't exist, then we'd have to issue the filter for the Players,
construct a PlayerResultSet doc and index it, and query for the Stories
that have those Player Ids; not sure if it would be worse to issue all the
ids in a query, or index the PlayerResultSet doc with Refresh==true (or
issue the query and queue up the PlayerResultSet doc for later indexing, or
whatever).

The Player data should be fairly static; we could delete the documents
and recreate them each time we refresh Player data.

Ok, that sounds pretty awful, I'm hoping someone has a less
Rube-Goldberg approach; obviously, I'm sort of building in my filter query
caching mechanism, hopefully something like this can be more easily
achieved with the built-in filter caching.

thanks for any insights,
Don

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/91919a48-0892-4878-890b-e14c67fd40b5%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/91919a48-0892-4878-890b-e14c67fd40b5%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/22ef7166-a15a-430b-b0e2-3c99285fa380%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/22ef7166-a15a-430b-b0e2-3c99285fa380%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJ3KEoBh6pgaH1vfzFjtukCr0emkhsMovt1rMP9x7kt7p7uPRw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #5