Date Histogram Facet and interval. How does it work under covers?

Hi Guys,

I'm trying to better understand how the Date Histogram Facet works.

Say I want to get the histogram for a whole year at "month" interval vs get the histogram for a day at "minute" interval. Does ES actually count EVERY document that matches the query? Meaning the operation is the same regardless of the interval?

If so, how do you optimize this for a large number of documents and long periods of time, i.e. millions of documents in a year but drawing a monthly trend chart? Does the sharing of indexes based on time time that Shay talked about in https://speakerdeck.com/kimchy/elasticsearch-big-data-search-analytics help in this use case?

Thanks,

Drew

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Say I want to get the histogram for a whole year at "month" interval vs
get the histogram for a day at "minute" interval. Does ES actually count
EVERY document that matches the query? Meaning the operation is the same
regardless of the interval?

Yes, but you sound concerned about performance / speed -- Elasticsearch
uses dedicated caches for faceting; it does not "iterate" over every
document in the matching set. So the optimization is already there. Of
course, you always win, when you can split the data, and date histogram is
no different. Though, in my experience, it's very fast.

Karel

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Karel,

Can you give me some numbers that you've seen in the wild? What I'm looking mainly is to find out how much memory/time will be required to process a monthly histogram of say 1 billion documents that all match the query. (To be able to draw a nice chart)

-- Drew

On Feb 24, 2013, at 9:54 PM, Karel Minařík karel.minarik@elasticsearch.com wrote:

Say I want to get the histogram for a whole year at "month" interval vs get the histogram for a day at "minute" interval. Does ES actually count EVERY document that matches the query? Meaning the operation is the same regardless of the interval?

Yes, but you sound concerned about performance / speed -- Elasticsearch uses dedicated caches for faceting; it does not "iterate" over every document in the matching set. So the optimization is already there. Of course, you always win, when you can split the data, and date histogram is no different. Though, in my experience, it's very fast.

Karel

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.