Distinct results by field for a given query

I have the following problem, I have a document that has a field 'xxx'
which may have duplicate values across the entire index,
I want to do a very simple thing, I want to be able to query the index
using a bool query on all my other fields,

but the results of the query should return only distinct results based on
xxx, my index simulates people, and people who live in the same house are
duplicates. I would like only to have distinct houses in my results but the
search is done across all houses

I know the duplication in advance as this is a one time index job, Is there
a trick I can do to enable this feature in elasticsearch, I was reading
around and I know that distinct is not present in elastic or lucene out of
the box
I am asking for some advanced ideas on how to make this happen, including
some clever indexing as I have full control and I also know the duplicates
in advance

I have two scenarios:

  1. I want to count the results of a given query- needs to be very fast
  2. I want to retrieve the actual documents - performance does not matter

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi David,

As far as I know, there is no grouping functionality in ES (like Solr for
instance). However, they do have a nifty parent/child feature you may want
to take a look at. You could index the people as children of house objects,
and then make use of the "has_child" query. This will return only parent
items that have matching properties in their children. I hope that helps.

Best,

Jorge

On Tuesday, June 18, 2013 4:17:35 PM UTC-4, David MZ wrote:

I have the following problem, I have a document that has a field 'xxx'
which may have duplicate values across the entire index,
I want to do a very simple thing, I want to be able to query the index
using a bool query on all my other fields,

but the results of the query should return only distinct results based
on xxx, my index simulates people, and people who live in the same house
are duplicates. I would like only to have distinct houses in my results but
the search is done across all houses

I know the duplication in advance as this is a one time index job, Is
there a trick I can do to enable this feature in elasticsearch, I was
reading around and I know that distinct is not present in elastic or lucene
out of the box
I am asking for some advanced ideas on how to make this happen, including
some clever indexing as I have full control and I also know the duplicates
in advance

I have two scenarios:

  1. I want to count the results of a given query- needs to be very fast
  2. I want to retrieve the actual documents - performance does not matter

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thank I think I can see the light at the end of the tunnel.

Would the has_child query return also the child document?

On Tue, Jun 18, 2013 at 11:39 PM, Jorge T
jorge.alberto.trujillo@gmail.comwrote:

Hi David,

As far as I know, there is no grouping functionality in ES (like Solr for
instance). However, they do have a nifty parent/child feature you may want
to take a look at. You could index the people as children of house objects,
and then make use of the "has_child" query. This will return only parent
items that have matching properties in their children. I hope that helps.

Best,

Jorge

On Tuesday, June 18, 2013 4:17:35 PM UTC-4, David MZ wrote:

I have the following problem, I have a document that has a field 'xxx'
which may have duplicate values across the entire index,
I want to do a very simple thing, I want to be able to query the index
using a bool query on all my other fields,

but the results of the query should return only distinct results based
on xxx, my index simulates people, and people who live in the same house
are duplicates. I would like only to have distinct houses in my results but
the search is done across all houses

I know the duplication in advance as this is a one time index job, Is
there a trick I can do to enable this feature in elasticsearch, I was
reading around and I know that distinct is not present in elastic or lucene
out of the box
I am asking for some advanced ideas on how to make this happen, including
some clever indexing as I have full control and I also know the duplicates
in advance

I have two scenarios:

  1. I want to count the results of a given query- needs to be very fast
  2. I want to retrieve the actual documents - performance does not matter

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/_3HVEt9__aQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi,

It does not, it only returns parents. There is a "has_parent" query which
is the mirror of this, or you could just simply gather up the IDs and
retrieve the children directly.

Also check out "top_children", which seems like it is similar to has_child,
but may be more performant as it seems it does not need to traverse the
entire index.

Best of luck,

Jorge

On Tuesday, June 18, 2013 4:17:35 PM UTC-4, David MZ wrote:

I have the following problem, I have a document that has a field 'xxx'
which may have duplicate values across the entire index,
I want to do a very simple thing, I want to be able to query the index
using a bool query on all my other fields,

but the results of the query should return only distinct results based
on xxx, my index simulates people, and people who live in the same house
are duplicates. I would like only to have distinct houses in my results but
the search is done across all houses

I know the duplication in advance as this is a one time index job, Is
there a trick I can do to enable this feature in elasticsearch, I was
reading around and I know that distinct is not present in elastic or lucene
out of the box
I am asking for some advanced ideas on how to make this happen, including
some clever indexing as I have full control and I also know the duplicates
in advance

I have two scenarios:

  1. I want to count the results of a given query- needs to be very fast
  2. I want to retrieve the actual documents - performance does not matter

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

My issue is that I need to simulate distinct, so I "need" to know which of
the children has triggered the parent, so I can include it into the results
as two children may have the same parent
the has_parent query won't give me a distinct answer

I did not understand the retrieve ids approach

On Wed, Jun 19, 2013 at 1:03 AM, Jorge T
jorge.alberto.trujillo@gmail.comwrote:

Hi,

It does not, it only returns parents. There is a "has_parent" query which
is the mirror of this, or you could just simply gather up the IDs and
retrieve the children directly.

Also check out "top_children", which seems like it is similar to
has_child, but may be more performant as it seems it does not need to
traverse the entire index.

Best of luck,

Jorge

On Tuesday, June 18, 2013 4:17:35 PM UTC-4, David MZ wrote:

I have the following problem, I have a document that has a field 'xxx'
which may have duplicate values across the entire index,
I want to do a very simple thing, I want to be able to query the index
using a bool query on all my other fields,

but the results of the query should return only distinct results based
on xxx, my index simulates people, and people who live in the same house
are duplicates. I would like only to have distinct houses in my results but
the search is done across all houses

I know the duplication in advance as this is a one time index job, Is
there a trick I can do to enable this feature in elasticsearch, I was
reading around and I know that distinct is not present in elastic or lucene
out of the box
I am asking for some advanced ideas on how to make this happen, including
some clever indexing as I have full control and I also know the duplicates
in advance

I have two scenarios:

  1. I want to count the results of a given query- needs to be very fast
  2. I want to retrieve the actual documents - performance does not matter

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/_3HVEt9__aQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.