I have some data like :
{
"url":"xxx",
"pv":n,
"date":Date
}
url is a string and pv is an integer, what I want is to sum pv of an url
form one date to another date, what aggregation should I use?
Depending on your use case you can do any one of a few options:
Search for a specific date range, a specific URL and retrieve the count
of pv for the URL over the date range:
{
"query": {
"constant_score": {
"filter": {
"bool": {
"must": [
{
"range": {
"date": {
"from": "2014/01/01",
"to": "2014/01/04"
}
}
},
{
"term": {
"url": "http://www.example.com"
}
}
]
}
}
}
},
"aggs": {
"pvCount": {
"sum": {
"field" : "pv"
}
}
}
}
Search for a specific date range and return an aggregation for the count
of pv for the top N URLs ordered by pv:
{
"query": {
"constant_score": {
"filter": {
"range": {
"date": {
"from": "2014/01/01",
"to": "2014/01/01"
}
}
}
}
},
"aggs": {
"topURLs": {
"terms": {
"field": "url",
"size": 10,
"order": {
"pvCount": "desc"
}
},
"aggs": {
"pvCount": {
"sum": {
"field" : "pv"
}
}
}
}
}
}
Return an aggregation which buckets date ranges and within those buckets
returns the count of pv for the top N URLs ordered by pv:
{
"aggs": {
"dateByDay": {
"date_histogram": {
"field": "date",
"interval": "day"
},
"aggs": {
"topURLs": {
"terms": {
"field": "url",
"size": 10,
"order": {
"pvCount": "desc"
}
},
"aggs": {
"pvCount": {
"sum": {
"field" : "pv"
}
}
}
}
}
}
}
}
I hope this helps clarify some of your options,
Regards,
Colin
On Saturday, 28 June 2014 05:02:51 UTC+2, Helennie Nie wrote:
Hi there
I have some data like :
{
"url":"xxx",
"pv":n,
"date":Date
}
url is a string and pv is an integer, what I want is to sum pv of an url
form one date to another date, what aggregation should I use?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.