Get total of distinct values in a field


(Jorge Luis Betancourt Gonzalez) #1

Hi:

I'm indexing a lot of information from twitter and other social media, is there a way I can get a count of all the distinct terms present in a field ? I've seen something mentioned/closed on https://github.com/elasticsearch/elasticsearch/issues/1044. But even with the new aggregation framework I see how this would be possible, in this particular case I'm trying to get the total number of authors (which could go very high), I know that by using the facet terms I could all count of all author, but I'm interested only in the total.

Greetings,


III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 2014. Ver www.uci.cu

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/509052727.11780359.1393805786037.JavaMail.zimbra%40uci.cu.
For more options, visit https://groups.google.com/groups/opt_out.


(Dan Fairs) #2

I'm indexing a lot of information from twitter and other social media, is there a way I can get a count of all the distinct terms present in a field ? I've seen something mentioned/closed on https://github.com/elasticsearch/elasticsearch/issues/1044. But even with the new aggregation framework I see how this would be possible, in this particular case I'm trying to get the total number of authors (which could go very high), I know that by using the facet terms I could all count of all author, but I'm interested only in the total.

It's not ideal, but we do this by creating a term facet on the field in question, and counting the number of entries.

I'd also note that we don't do this in live web requests, but in batch jobs, and store the result!

Cheers,
Dan

Dan Fairs | dan.fairs@gmail.com | @danfairs | secondsync.com

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/C08902A7-958D-4804-8023-F57E4950EFBB%40gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jorge Luis Betancourt Gonzalez) #3

Thanks for the reply! Yeap I think this is the only workaround. Now I'm wondering if is possible to cache this value using the elasticsearch builtin cache, because on my case I needed for live requests, but I don't mind if there are a few seconds of stolen data.

----- Original Message -----

From: "Dan Fairs" dan.fairs@gmail.com
To: elasticsearch@googlegroups.com
Sent: Monday, March 3, 2014 4:33:59 PM
Subject: Re: Get total of distinct values in a field

I'm indexing a lot of information from twitter and other social media, is there a way I can get a count of all the distinct terms present in a field ? I've seen something mentioned/closed on https://github.com/elasticsearch/elasticsearch/issues/1044 . But even with the new aggregation framework I see how this would be possible, in this particular case I'm trying to get the total number of authors (which could go very high), I know that by using the facet terms I could all count of all author, but I'm interested only in the total.

It's not ideal, but we do this by creating a term facet on the field in question, and counting the number of entries.

I'd also note that we don't do this in live web requests, but in batch jobs, and store the result!

Cheers,
Dan

Dan Fairs | dan.fairs@gmail.com | @danfairs | secondsync.com

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/C08902A7-958D-4804-8023-F57E4950EFBB%40gmail.com .
For more options, visit https://groups.google.com/groups/opt_out .


III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 2014. Ver www.uci.cu

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/27774731.12054375.1393885609260.JavaMail.zimbra%40uci.cu.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #4