How Can I Perform a Distinct Query?

Randall_Degges · June 25, 2013, 9:12am

Hi Guys,

I'm a new ElasticSearch user, and need some help selecting only documents
with a distinct field. I've got an index of documents which look something
like the following:

{
"last_name": "Degges",
"first_name": "Randall",
"state": "CA",
}

I'm trying to return a list of de-duplicated documents which contain the
same last name. For instance -- if I had three documents, two with a
"last_name" field of "Degges" and one with a "last_name" field of "Perez",
I'd want to return only two documents: one document where "Degges" is the
last name (I don't care which), and one where "Perez" is the last name.

I realize that this question has been asked before on this mailing list,
but even after reading through the documentation on facets, asking on the
IRC channel, and doing lots of trial-and-error testing, I can't figure out
how to make it work.

I'm hoping some of you can give me specific example queries I can use to do
the de-duplication.

The reason I'm attempting to do this is that my application needs to return
a list of unique last names for people in a given state (e.g. CA). Since I
have so many people in my index, it would be incredibly slow for me to
select all of the people at once (with duplicates), and de-duplicate things
on my end

Any help would be greatly appreciated.

Thank you.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Justin_Treher · June 25, 2013, 12:04pm

Basically, you want to send in a query with some specific filters and then
only return a list of facets based on last_name? This would give you a list
of unique last names along with their counts, which is sounds like you can
ignore. You can't return "unique documents" as every document is already
unique.

{"query":"match_all":{},
"facets": {
"filters": {
"terms": {
"field": "last_name",
"size": 10000
}
}}}

On Tuesday, June 25, 2013 5:12:51 AM UTC-4, Randall Degges wrote:

Hi Guys,

I'm a new Elasticsearch user, and need some help selecting only documents
with a distinct field. I've got an index of documents which look something
like the following:

{
"last_name": "Degges",
"first_name": "Randall",
"state": "CA",
}

I'm trying to return a list of de-duplicated documents which contain the
same last name. For instance -- if I had three documents, two with a
"last_name" field of "Degges" and one with a "last_name" field of "Perez",
I'd want to return only two documents: one document where "Degges" is the
last name (I don't care which), and one where "Perez" is the last name.

I realize that this question has been asked before on this mailing list,
but even after reading through the documentation on facets, asking on the
IRC channel, and doing lots of trial-and-error testing, I can't figure out
how to make it work.

I'm hoping some of you can give me specific example queries I can use to
do the de-duplication.

The reason I'm attempting to do this is that my application needs to
return a list of unique last names for people in a given state (e.g. CA).
Since I have so many people in my index, it would be incredibly slow for
me to select all of the people at once (with duplicates), and de-duplicate
things on my end

Any help would be greatly appreciated.

Thank you.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Randall_Degges · June 26, 2013, 6:36am

Thanks Justin,

That actually solved my problem perfectly!

-Randall

On Tue, Jun 25, 2013 at 5:04 AM, Justin Treher jtreher@gmail.com wrote:

Basically, you want to send in a query with some specific filters and then
only return a list of facets based on last_name? This would give you a list
of unique last names along with their counts, which is sounds like you can
ignore. You can't return "unique documents" as every document is already
unique.

{"query":"match_all":{},
"facets": {
"filters": {
"terms": {
"field": "last_name",
"size": 10000
}
}}}

On Tuesday, June 25, 2013 5:12:51 AM UTC-4, Randall Degges wrote:

Hi Guys,

I'm a new Elasticsearch user, and need some help selecting only documents
with a distinct field. I've got an index of documents which look something
like the following:

{
"last_name": "Degges",
"first_name": "Randall",
"state": "CA",
}

I'm trying to return a list of de-duplicated documents which contain the
same last name. For instance -- if I had three documents, two with a
"last_name" field of "Degges" and one with a "last_name" field of "Perez",
I'd want to return only two documents: one document where "Degges" is the
last name (I don't care which), and one where "Perez" is the last name.

I realize that this question has been asked before on this mailing list,
but even after reading through the documentation on facets, asking on the
IRC channel, and doing lots of trial-and-error testing, I can't figure out
how to make it work.

I'm hoping some of you can give me specific example queries I can use to
do the de-duplication.

The reason I'm attempting to do this is that my application needs to
return a list of unique last names for people in a given state (e.g. CA).
Since I have so many people in my index, it would be incredibly slow for
me to select all of the people at once (with duplicates), and de-duplicate
things on my end

Any help would be greatly appreciated.

Thank you.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/-C2mdiKfcVk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Randall Degges
http://rdegges.com/

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thomas_Bolis · June 27, 2013, 5:58am

Hi,

I have a similar problem.. but can we actually count the distinct terms
without fetching them from ES? There might be some million of unique terms
that we only need to know is how many they are in a specific time period
for instance (range)

Thanks

On Wednesday, 26 June 2013 09:36:54 UTC+3, Randall Degges wrote:

Thanks Justin,

That actually solved my problem perfectly!

-Randall

On Tue, Jun 25, 2013 at 5:04 AM, Justin Treher <jtr...@gmail.com<javascript:>

wrote:

Basically, you want to send in a query with some specific filters and
then only return a list of facets based on last_name? This would give you a
list of unique last names along with their counts, which is sounds like you
can ignore. You can't return "unique documents" as every document is
already unique.

{"query":"match_all":{},
"facets": {
"filters": {
"terms": {
"field": "last_name",
"size": 10000
}
}}}

On Tuesday, June 25, 2013 5:12:51 AM UTC-4, Randall Degges wrote:

Hi Guys,

I'm a new Elasticsearch user, and need some help selecting only
documents with a distinct field. I've got an index of documents which look
something like the following:

{
"last_name": "Degges",
"first_name": "Randall",
"state": "CA",
}

I'm trying to return a list of de-duplicated documents which contain the
same last name. For instance -- if I had three documents, two with a
"last_name" field of "Degges" and one with a "last_name" field of "Perez",
I'd want to return only two documents: one document where "Degges" is the
last name (I don't care which), and one where "Perez" is the last name.

I realize that this question has been asked before on this mailing list,
but even after reading through the documentation on facets, asking on the
IRC channel, and doing lots of trial-and-error testing, I can't figure out
how to make it work.

I'm hoping some of you can give me specific example queries I can use to
do the de-duplication.

The reason I'm attempting to do this is that my application needs to
return a list of unique last names for people in a given state (e.g. CA).
Since I have so many people in my index, it would be incredibly slow for
me to select all of the people at once (with duplicates), and de-duplicate
things on my end

Any help would be greatly appreciated.

Thank you.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/-C2mdiKfcVk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
Randall Degges
http://rdegges.com/

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Implementing SELECT distinct user_name FROM index? Elasticsearch	2	421	July 6, 2017
How to return distinct values from query based on a field Elasticsearch	3	78840	May 12, 2020
Distinct results by field for a given query Elasticsearch	5	950	July 6, 2017
Returning Distinct Values in a query Elasticsearch	2	4809	September 27, 2019
How to get Distinct results using Search API Elasticsearch	5	169	July 19, 2023

How Can I Perform a Distinct Query?

Related topics