How Can I Perform a Distinct Query?

Hi Guys,

I'm a new ElasticSearch user, and need some help selecting only documents
with a distinct field. I've got an index of documents which look something
like the following:

{
"last_name": "Degges",
"first_name": "Randall",
"state": "CA",
}

I'm trying to return a list of de-duplicated documents which contain the
same last name. For instance -- if I had three documents, two with a
"last_name" field of "Degges" and one with a "last_name" field of "Perez",
I'd want to return only two documents: one document where "Degges" is the
last name (I don't care which), and one where "Perez" is the last name.

I realize that this question has been asked before on this mailing list,
but even after reading through the documentation on facets, asking on the
IRC channel, and doing lots of trial-and-error testing, I can't figure out
how to make it work.

I'm hoping some of you can give me specific example queries I can use to do
the de-duplication.

The reason I'm attempting to do this is that my application needs to return
a list of unique last names for people in a given state (e.g. CA). Since I
have so many people in my index, it would be incredibly slow for me to
select all of the people at once (with duplicates), and de-duplicate things
on my end :frowning:

Any help would be greatly appreciated.

Thank you.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Basically, you want to send in a query with some specific filters and then
only return a list of facets based on last_name? This would give you a list
of unique last names along with their counts, which is sounds like you can
ignore. You can't return "unique documents" as every document is already
unique.

{"query":"match_all":{},
"facets": {
"filters": {
"terms": {
"field": "last_name",
"size": 10000
}
}}}

On Tuesday, June 25, 2013 5:12:51 AM UTC-4, Randall Degges wrote:

Hi Guys,

I'm a new Elasticsearch user, and need some help selecting only documents
with a distinct field. I've got an index of documents which look something
like the following:

{
"last_name": "Degges",
"first_name": "Randall",
"state": "CA",
}

I'm trying to return a list of de-duplicated documents which contain the
same last name. For instance -- if I had three documents, two with a
"last_name" field of "Degges" and one with a "last_name" field of "Perez",
I'd want to return only two documents: one document where "Degges" is the
last name (I don't care which), and one where "Perez" is the last name.

I realize that this question has been asked before on this mailing list,
but even after reading through the documentation on facets, asking on the
IRC channel, and doing lots of trial-and-error testing, I can't figure out
how to make it work.

I'm hoping some of you can give me specific example queries I can use to
do the de-duplication.

The reason I'm attempting to do this is that my application needs to
return a list of unique last names for people in a given state (e.g. CA).
Since I have so many people in my index, it would be incredibly slow for
me to select all of the people at once (with duplicates), and de-duplicate
things on my end :frowning:

Any help would be greatly appreciated.

Thank you.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thanks Justin,

That actually solved my problem perfectly!

-Randall

On Tue, Jun 25, 2013 at 5:04 AM, Justin Treher jtreher@gmail.com wrote:

Basically, you want to send in a query with some specific filters and then
only return a list of facets based on last_name? This would give you a list
of unique last names along with their counts, which is sounds like you can
ignore. You can't return "unique documents" as every document is already
unique.

{"query":"match_all":{},
"facets": {
"filters": {
"terms": {
"field": "last_name",
"size": 10000
}
}}}

On Tuesday, June 25, 2013 5:12:51 AM UTC-4, Randall Degges wrote:

Hi Guys,

I'm a new Elasticsearch user, and need some help selecting only documents
with a distinct field. I've got an index of documents which look something
like the following:

{
"last_name": "Degges",
"first_name": "Randall",
"state": "CA",
}

I'm trying to return a list of de-duplicated documents which contain the
same last name. For instance -- if I had three documents, two with a
"last_name" field of "Degges" and one with a "last_name" field of "Perez",
I'd want to return only two documents: one document where "Degges" is the
last name (I don't care which), and one where "Perez" is the last name.

I realize that this question has been asked before on this mailing list,
but even after reading through the documentation on facets, asking on the
IRC channel, and doing lots of trial-and-error testing, I can't figure out
how to make it work.

I'm hoping some of you can give me specific example queries I can use to
do the de-duplication.

The reason I'm attempting to do this is that my application needs to
return a list of unique last names for people in a given state (e.g. CA).
Since I have so many people in my index, it would be incredibly slow for
me to select all of the people at once (with duplicates), and de-duplicate
things on my end :frowning:

Any help would be greatly appreciated.

Thank you.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/-C2mdiKfcVk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Randall Degges
http://rdegges.com/

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi,

I have a similar problem.. but can we actually count the distinct terms
without fetching them from ES? There might be some million of unique terms
that we only need to know is how many they are in a specific time period
for instance (range)

Thanks

On Wednesday, 26 June 2013 09:36:54 UTC+3, Randall Degges wrote:

Thanks Justin,

That actually solved my problem perfectly!

-Randall

On Tue, Jun 25, 2013 at 5:04 AM, Justin Treher <jtr...@gmail.com<javascript:>

wrote:

Basically, you want to send in a query with some specific filters and
then only return a list of facets based on last_name? This would give you a
list of unique last names along with their counts, which is sounds like you
can ignore. You can't return "unique documents" as every document is
already unique.

{"query":"match_all":{},
"facets": {
"filters": {
"terms": {
"field": "last_name",
"size": 10000
}
}}}

On Tuesday, June 25, 2013 5:12:51 AM UTC-4, Randall Degges wrote:

Hi Guys,

I'm a new Elasticsearch user, and need some help selecting only
documents with a distinct field. I've got an index of documents which look
something like the following:

{
"last_name": "Degges",
"first_name": "Randall",
"state": "CA",
}

I'm trying to return a list of de-duplicated documents which contain the
same last name. For instance -- if I had three documents, two with a
"last_name" field of "Degges" and one with a "last_name" field of "Perez",
I'd want to return only two documents: one document where "Degges" is the
last name (I don't care which), and one where "Perez" is the last name.

I realize that this question has been asked before on this mailing list,
but even after reading through the documentation on facets, asking on the
IRC channel, and doing lots of trial-and-error testing, I can't figure out
how to make it work.

I'm hoping some of you can give me specific example queries I can use to
do the de-duplication.

The reason I'm attempting to do this is that my application needs to
return a list of unique last names for people in a given state (e.g. CA).
Since I have so many people in my index, it would be incredibly slow for
me to select all of the people at once (with duplicates), and de-duplicate
things on my end :frowning:

Any help would be greatly appreciated.

Thank you.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/-C2mdiKfcVk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
Randall Degges
http://rdegges.com/

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.