Search results grouping (aka field combining/collapsing, distinct, de-dup) or alternate sollution


(Alex López) #1

Hi!

I just built the current 0.20.0 snapshot and I was wondering what is the
evolution of the issue discussed here:

Is it possible to use anything of this sort with the current 0.20.0
snapshot?

I'll try to explain our issue to see if it can be resolved in another way
if search grouping is still far from implemented:

We have users that can have several roles (one or many), currently each
user-role is indexed as one independent document. We want to search by user
name, and get only one result per user (it does not matter which). So
ideally we would like to group by user name. I have been reading about
nested documents, and parent/child relationships. (Are they related?) Which
one would better cover our use-case? Note that we might index different
user-roles at different times, so perhaps parent-child indexing is more
suited. Is parent mapping always done at type level? (Can we index a
document and tell ES which is its parent document, or is it always infered
from their respective types?)

Thanks for such an excellent search engine and thanks in advance for any
clarifications on our issue!


(Alex López) #2

Sorry for the premature questions on parent-child relationships, I think
we'll go for a solution implemented using parent field mapping and some
ORed has_child filters (ala
http://www.spacevatican.org/2012/6/3/fun-with-elasticsearch-s-children-and-nested-documents/
).

On Thursday, August 9, 2012 3:23:25 PM UTC+1, Alex López wrote:

Hi!

I just built the current 0.20.0 snapshot and I was wondering what is the
evolution of the issue discussed here:

https://github.com/elasticsearch/elasticsearch/issues/256

Is it possible to use anything of this sort with the current 0.20.0
snapshot?

I'll try to explain our issue to see if it can be resolved in another way
if search grouping is still far from implemented:

We have users that can have several roles (one or many), currently each
user-role is indexed as one independent document. We want to search by user
name, and get only one result per user (it does not matter which). So
ideally we would like to group by user name. I have been reading about
nested documents, and parent/child relationships. (Are they related?) Which
one would better cover our use-case? Note that we might index different
user-roles at different times, so perhaps parent-child indexing is more
suited. Is parent mapping always done at type level? (Can we index a
document and tell ES which is its parent document, or is it always infered
from their respective types?)

Thanks for such an excellent search engine and thanks in advance for any
clarifications on our issue!


(Ivan Brusic) #3

Lucene 4 will have (has) a grouping API:
http://lucene.apache.org/core/4_0_0-ALPHA/grouping/index.html

It might be worthwhile to adhere to the Lucene API and not recreate
something that is not portable. Shay's call. That said, Lucene 4 is
still only in alpha, and since the API might change, so coding against
it might be a bit premature. Perhaps Simon W has some more insight.

Cheers,

Ivan

On Thu, Aug 9, 2012 at 9:34 AM, Alex López aliksandr@gmail.com wrote:

Sorry for the premature questions on parent-child relationships, I think
we'll go for a solution implemented using parent field mapping and some ORed
has_child filters (ala
http://www.spacevatican.org/2012/6/3/fun-with-elasticsearch-s-children-and-nested-documents/
).

On Thursday, August 9, 2012 3:23:25 PM UTC+1, Alex López wrote:

Hi!

I just built the current 0.20.0 snapshot and I was wondering what is the
evolution of the issue discussed here:

https://github.com/elasticsearch/elasticsearch/issues/256

Is it possible to use anything of this sort with the current 0.20.0
snapshot?

I'll try to explain our issue to see if it can be resolved in another way
if search grouping is still far from implemented:

We have users that can have several roles (one or many), currently each
user-role is indexed as one independent document. We want to search by user
name, and get only one result per user (it does not matter which). So
ideally we would like to group by user name. I have been reading about
nested documents, and parent/child relationships. (Are they related?) Which
one would better cover our use-case? Note that we might index different
user-roles at different times, so perhaps parent-child indexing is more
suited. Is parent mapping always done at type level? (Can we index a
document and tell ES which is its parent document, or is it always infered
from their respective types?)

Thanks for such an excellent search engine and thanks in advance for any
clarifications on our issue!


(Alex López) #4

Thanks for the feedback, I read that version 0.20.0 was getting some
refactoring to allow for this kind of queries, but I guess waiting for
Lucene 4 API to freeze makes sense anyway.

2012/8/9 Ivan Brusic ivan@brusic.com:

Lucene 4 will have (has) a grouping API:
http://lucene.apache.org/core/4_0_0-ALPHA/grouping/index.html

It might be worthwhile to adhere to the Lucene API and not recreate
something that is not portable. Shay's call. That said, Lucene 4 is
still only in alpha, and since the API might change, so coding against
it might be a bit premature. Perhaps Simon W has some more insight.

Cheers,

Ivan

On Thu, Aug 9, 2012 at 9:34 AM, Alex López aliksandr@gmail.com wrote:

Sorry for the premature questions on parent-child relationships, I think
we'll go for a solution implemented using parent field mapping and some ORed
has_child filters (ala
http://www.spacevatican.org/2012/6/3/fun-with-elasticsearch-s-children-and-nested-documents/
).

On Thursday, August 9, 2012 3:23:25 PM UTC+1, Alex López wrote:

Hi!

I just built the current 0.20.0 snapshot and I was wondering what is the
evolution of the issue discussed here:

https://github.com/elasticsearch/elasticsearch/issues/256

Is it possible to use anything of this sort with the current 0.20.0
snapshot?

I'll try to explain our issue to see if it can be resolved in another way
if search grouping is still far from implemented:

We have users that can have several roles (one or many), currently each
user-role is indexed as one independent document. We want to search by user
name, and get only one result per user (it does not matter which). So
ideally we would like to group by user name. I have been reading about
nested documents, and parent/child relationships. (Are they related?) Which
one would better cover our use-case? Note that we might index different
user-roles at different times, so perhaps parent-child indexing is more
suited. Is parent mapping always done at type level? (Can we index a
document and tell ES which is its parent document, or is it always infered
from their respective types?)

Thanks for such an excellent search engine and thanks in advance for any
clarifications on our issue!


(Ivan Brusic) #5

Lucene 4.0 is now in beta:
http://search-lucene.com/m/9LhyoLfdKY&subj=+ANNOUNCE+Apache+Lucene+4+0+beta+released+

Hopefully the full release will happen on schedule around September/October.

On Fri, Aug 10, 2012 at 2:29 AM, Alex Rodriguez Lopez
aliksandr@gmail.com wrote:

Thanks for the feedback, I read that version 0.20.0 was getting some
refactoring to allow for this kind of queries, but I guess waiting for
Lucene 4 API to freeze makes sense anyway.

2012/8/9 Ivan Brusic ivan@brusic.com:

Lucene 4 will have (has) a grouping API:
http://lucene.apache.org/core/4_0_0-ALPHA/grouping/index.html

It might be worthwhile to adhere to the Lucene API and not recreate
something that is not portable. Shay's call. That said, Lucene 4 is
still only in alpha, and since the API might change, so coding against
it might be a bit premature. Perhaps Simon W has some more insight.

Cheers,

Ivan

On Thu, Aug 9, 2012 at 9:34 AM, Alex López aliksandr@gmail.com wrote:

Sorry for the premature questions on parent-child relationships, I think
we'll go for a solution implemented using parent field mapping and some ORed
has_child filters (ala
http://www.spacevatican.org/2012/6/3/fun-with-elasticsearch-s-children-and-nested-documents/
).

On Thursday, August 9, 2012 3:23:25 PM UTC+1, Alex López wrote:

Hi!

I just built the current 0.20.0 snapshot and I was wondering what is the
evolution of the issue discussed here:

https://github.com/elasticsearch/elasticsearch/issues/256

Is it possible to use anything of this sort with the current 0.20.0
snapshot?

I'll try to explain our issue to see if it can be resolved in another way
if search grouping is still far from implemented:

We have users that can have several roles (one or many), currently each
user-role is indexed as one independent document. We want to search by user
name, and get only one result per user (it does not matter which). So
ideally we would like to group by user name. I have been reading about
nested documents, and parent/child relationships. (Are they related?) Which
one would better cover our use-case? Note that we might index different
user-roles at different times, so perhaps parent-child indexing is more
suited. Is parent mapping always done at type level? (Can we index a
document and tell ES which is its parent document, or is it always infered
from their respective types?)

Thanks for such an excellent search engine and thanks in advance for any
clarifications on our issue!

--


(Jim Hazen) #6

I'm also eagerly awaiting this feature. I'm developing a catalog of
applications (think app store). Let's say you'd like to page through a
list of applications where the results displayed contain the highest
version of the app you have access to (based on a number of factors). This
is very difficult to do outside of the search engine while maintaining the
appropriate paging values.

Like you say, if there's a way to do this with the .19 or .20 beta
releases, I'd love to understand it as well.

--


(system) #7