Getting time range windowed counts


(Josh Harrison) #1

I'm looking to be able to easily expose an interface where I can hit an
alias and get back the counts of results every hour for the past 24 hours.
Is this something I can do with a filtered alias? Or is this best done with
an outside programming language's interface into ES?
I want get something back like the following when called 0000 (12:00am) on
November 20th.
{
"201311190000": 50,
"201311190100": 14,
"201311190200": 6,
"201311190300": 87,
"201311190400": 304,
"201311190500": 12,
"201311190600": 99,
"201311190700": 18,
"201311190800": 3,
"201311190900": 3,
"201311191000": 0,
"201311191100": 0,
"201311191200": 0,
"201311191300": 678,
"201311191400": 447,
"201311191500": 930,
"201311191600": 48,
"201311191700": 23,
"201311191800": 1023,
"201311191900": 45,
"201311192000": 56,
"201311192100": 50,
"201311192200": 4,
"201311192300": 53
}
That is to say that between 12:00am and 1:00am on 11/19/2013, there are 50
records that fall within that time rage, 14 records between 1:00am and
2:00am, etc

Thanks!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Hannes Korte) #2

Hi Josh,

I guess you want to use the date histogram facet:

{
"size": 0,
"query": {
"filtered": {
"filter": {
"numeric_range": {
"timestamp": {
"gte": "now-1d"
}
}
}
}
},
"facets": {
"histo1": {
"date_histogram": {
"field": "timestamp",
"interval": "hour"
}
}
}
}

The filtered query only returns documents of the last 24 hours. The date
histogram facet groups the document counts.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-facets-date-histogram-facet.html

Best regards
Hannes

On 20.11.2013 20:28, Josh Harrison wrote:

I'm looking to be able to easily expose an interface where I can hit an
alias and get back the counts of results every hour for the past 24 hours.
Is this something I can do with a filtered alias? Or is this best done with
an outside programming language's interface into ES?
I want get something back like the following when called 0000 (12:00am) on
November 20th.
{
"201311190000": 50,
"201311190100": 14,
"201311190200": 6,
"201311190300": 87,
"201311190400": 304,
"201311190500": 12,
"201311190600": 99,
"201311190700": 18,
"201311190800": 3,
"201311190900": 3,
"201311191000": 0,
"201311191100": 0,
"201311191200": 0,
"201311191300": 678,
"201311191400": 447,
"201311191500": 930,
"201311191600": 48,
"201311191700": 23,
"201311191800": 1023,
"201311191900": 45,
"201311192000": 56,
"201311192100": 50,
"201311192200": 4,
"201311192300": 53
}
That is to say that between 12:00am and 1:00am on 11/19/2013, there are 50
records that fall within that time rage, 14 records between 1:00am and
2:00am, etc

Thanks!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Josh Harrison) #3

That looks excellent, thank you. If I wanted to get the number of unique
occurrences of a field within a bucket, it looks like I would need to use a
script? So, working from the previous data, say that the 1023 entries at
1800-1900 were written by 56 different authors total. What is the easiest
way to get that fact?
Thanks again!
On Nov 21, 2013 12:09 AM, "Hannes Korte" email@hkorte.com wrote:

Hi Josh,

I guess you want to use the date histogram facet:

{
"size": 0,
"query": {
"filtered": {
"filter": {
"numeric_range": {
"timestamp": {
"gte": "now-1d"
}
}
}
}
},
"facets": {
"histo1": {
"date_histogram": {
"field": "timestamp",
"interval": "hour"
}
}
}
}

The filtered query only returns documents of the last 24 hours. The date
histogram facet groups the document counts.

http://www.elasticsearch.org/guide/en/elasticsearch/
reference/current/search-facets-date-histogram-facet.html

Best regards
Hannes

On 20.11.2013 20:28, Josh Harrison wrote:

I'm looking to be able to easily expose an interface where I can hit an
alias and get back the counts of results every hour for the past 24 hours.
Is this something I can do with a filtered alias? Or is this best done
with
an outside programming language's interface into ES?
I want get something back like the following when called 0000 (12:00am) on
November 20th.
{
"201311190000": 50,
"201311190100": 14,
"201311190200": 6,
"201311190300": 87,
"201311190400": 304,
"201311190500": 12,
"201311190600": 99,
"201311190700": 18,
"201311190800": 3,
"201311190900": 3,
"201311191000": 0,
"201311191100": 0,
"201311191200": 0,
"201311191300": 678,
"201311191400": 447,
"201311191500": 930,
"201311191600": 48,
"201311191700": 23,
"201311191800": 1023,
"201311191900": 45,
"201311192000": 56,
"201311192100": 50,
"201311192200": 4,
"201311192300": 53
}
That is to say that between 12:00am and 1:00am on 11/19/2013, there are 50
records that fall within that time rage, 14 records between 1:00am and
2:00am, etc

Thanks!

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/
topic/elasticsearch/joBZY01sHT0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Hannes Korte) #4

Good question. I guess right now there is no direct solution to get what
you want in just one request. It looks similar to this feature request
for hierarchical facets:

Additionally, there is something new coming in future versions (1.0?)
called aggregations:

There you can define sub-aggregations as well.

For now, I would simply do it with a second roundtrip to gather the term
facets on the author field for each hour separately using a multi search.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-multi-search.html

Good luck!

Hannes

On 21.11.2013 09:58, Josh Harrison wrote:

That looks excellent, thank you. If I wanted to get the number of unique
occurrences of a field within a bucket, it looks like I would need to use a
script? So, working from the previous data, say that the 1023 entries at
1800-1900 were written by 56 different authors total. What is the easiest
way to get that fact?
Thanks again!
On Nov 21, 2013 12:09 AM, "Hannes Korte" email@hkorte.com wrote:

Hi Josh,

I guess you want to use the date histogram facet:

{
"size": 0,
"query": {
"filtered": {
"filter": {
"numeric_range": {
"timestamp": {
"gte": "now-1d"
}
}
}
}
},
"facets": {
"histo1": {
"date_histogram": {
"field": "timestamp",
"interval": "hour"
}
}
}
}

The filtered query only returns documents of the last 24 hours. The date
histogram facet groups the document counts.

http://www.elasticsearch.org/guide/en/elasticsearch/
reference/current/search-facets-date-histogram-facet.html

Best regards
Hannes

On 20.11.2013 20:28, Josh Harrison wrote:

I'm looking to be able to easily expose an interface where I can hit an
alias and get back the counts of results every hour for the past 24 hours.
Is this something I can do with a filtered alias? Or is this best done
with
an outside programming language's interface into ES?
I want get something back like the following when called 0000 (12:00am) on
November 20th.
{
"201311190000": 50,
"201311190100": 14,
"201311190200": 6,
"201311190300": 87,
"201311190400": 304,
"201311190500": 12,
"201311190600": 99,
"201311190700": 18,
"201311190800": 3,
"201311190900": 3,
"201311191000": 0,
"201311191100": 0,
"201311191200": 0,
"201311191300": 678,
"201311191400": 447,
"201311191500": 930,
"201311191600": 48,
"201311191700": 23,
"201311191800": 1023,
"201311191900": 45,
"201311192000": 56,
"201311192100": 50,
"201311192200": 4,
"201311192300": 53
}
That is to say that between 12:00am and 1:00am on 11/19/2013, there are 50
records that fall within that time rage, 14 records between 1:00am and
2:00am, etc

Thanks!

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/
topic/elasticsearch/joBZY01sHT0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Josh Harrison) #5

Great, thanks again. I found the following plugin to do pretty much what I
wanted, https://github.com/crate/elasticsearch-timefacets-plugin
This is derived from that plugin, looks really cool but doesn't work with
0.90.5
https://github.com/pearson-enabling-technologies/elasticsearch-approx-plugin

Wanted to post this in case someone down the line finds this. I may try to
get the approximation plugin working too.

On Thursday, November 21, 2013 1:29:13 AM UTC-8, Hannes Korte wrote:

Good question. I guess right now there is no direct solution to get what
you want in just one request. It looks similar to this feature request
for hierarchical facets:

https://github.com/elasticsearch/elasticsearch/issues/1076https://www.google.com/url?q=https%3A%2F%2Fgithub.com%2Felasticsearch%2Felasticsearch%2Fissues%2F1076&sa=D&sntz=1&usg=AFQjCNErJYRZOoxIXB_p7Q0b--QrDmDmxg

Additionally, there is something new coming in future versions (1.0?)
called aggregations:

https://github.com/elasticsearch/elasticsearch/issues/3300https://www.google.com/url?q=https%3A%2F%2Fgithub.com%2Felasticsearch%2Felasticsearch%2Fissues%2F3300&sa=D&sntz=1&usg=AFQjCNHUGr9VaKfckrmfokr7M7gSrWxWdA

There you can define sub-aggregations as well.

For now, I would simply do it with a second roundtrip to gather the term
facets on the author field for each hour separately using a multi search.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-multi-search.htmlhttp://www.google.com/url?q=http%3A%2F%2Fwww.elasticsearch.org%2Fguide%2Fen%2Felasticsearch%2Freference%2Fcurrent%2Fsearch-multi-search.html&sa=D&sntz=1&usg=AFQjCNENB0Y8zG2ONAGTSrA5pUHv5_-3yw

Good luck!

Hannes

On 21.11.2013 09:58, Josh Harrison wrote:

That looks excellent, thank you. If I wanted to get the number of unique
occurrences of a field within a bucket, it looks like I would need to
use a
script? So, working from the previous data, say that the 1023 entries at
1800-1900 were written by 56 different authors total. What is the
easiest
way to get that fact?
Thanks again!
On Nov 21, 2013 12:09 AM, "Hannes Korte" <em...@hkorte.com <javascript:>>
wrote:

Hi Josh,

I guess you want to use the date histogram facet:

{
"size": 0,
"query": {
"filtered": {
"filter": {
"numeric_range": {
"timestamp": {
"gte": "now-1d"
}
}
}
}
},
"facets": {
"histo1": {
"date_histogram": {
"field": "timestamp",
"interval": "hour"
}
}
}
}

The filtered query only returns documents of the last 24 hours. The
date

histogram facet groups the document counts.

http://www.elasticsearch.org/guide/en/elasticsearch/http://www.google.com/url?q=http%3A%2F%2Fwww.elasticsearch.org%2Fguide%2Fen%2Felasticsearch%2F&sa=D&sntz=1&usg=AFQjCNFgRLP9CpOsUwAJanbgNdblAeL2YA
reference/current/search-facets-date-histogram-facet.html

Best regards
Hannes

On 20.11.2013 20:28, Josh Harrison wrote:

I'm looking to be able to easily expose an interface where I can hit
an

alias and get back the counts of results every hour for the past 24
hours.

Is this something I can do with a filtered alias? Or is this best done
with
an outside programming language's interface into ES?
I want get something back like the following when called 0000
(12:00am) on

November 20th.
{
"201311190000": 50,
"201311190100": 14,
"201311190200": 6,
"201311190300": 87,
"201311190400": 304,
"201311190500": 12,
"201311190600": 99,
"201311190700": 18,
"201311190800": 3,
"201311190900": 3,
"201311191000": 0,
"201311191100": 0,
"201311191200": 0,
"201311191300": 678,
"201311191400": 447,
"201311191500": 930,
"201311191600": 48,
"201311191700": 23,
"201311191800": 1023,
"201311191900": 45,
"201311192000": 56,
"201311192100": 50,
"201311192200": 4,
"201311192300": 53
}
That is to say that between 12:00am and 1:00am on 11/19/2013, there
are 50

records that fall within that time rage, 14 records between 1:00am and
2:00am, etc

Thanks!

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/
topic/elasticsearch/joBZY01sHT0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #6