How to aggregate this kind of data


(Helennie Nie) #1

Hi there

I have some data like :
{
"url":"xxx",
"pv":n,
"date":Date
}
url is a string and pv is an integer, what I want is to sum pv of an url
form one date to another date, what aggregation should I use?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/630890a7-500f-4f86-b136-001c0a484353%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Colin Goodheart-Smithe) #2

Hi Helennie,

Depending on your use case you can do any one of a few options:

  1. Search for a specific date range, a specific URL and retrieve the count
    of pv for the URL over the date range:
    {
    "query": {
    "constant_score": {
    "filter": {
    "bool": {
    "must": [
    {
    "range": {
    "date": {
    "from": "2014/01/01",
    "to": "2014/01/04"
    }
    }
    },
    {
    "term": {
    "url": "http://www.example.com"
    }
    }
    ]
    }
    }
    }
    },
    "aggs": {
    "pvCount": {
    "sum": {
    "field" : "pv"
    }
    }
    }
    }

  2. Search for a specific date range and return an aggregation for the count
    of pv for the top N URLs ordered by pv:
    {
    "query": {
    "constant_score": {
    "filter": {
    "range": {
    "date": {
    "from": "2014/01/01",
    "to": "2014/01/01"
    }
    }
    }
    }
    },
    "aggs": {
    "topURLs": {
    "terms": {
    "field": "url",
    "size": 10,
    "order": {
    "pvCount": "desc"
    }
    },
    "aggs": {
    "pvCount": {
    "sum": {
    "field" : "pv"
    }
    }
    }
    }
    }
    }

  3. Return an aggregation which buckets date ranges and within those buckets
    returns the count of pv for the top N URLs ordered by pv:
    {
    "aggs": {
    "dateByDay": {
    "date_histogram": {
    "field": "date",
    "interval": "day"
    },
    "aggs": {
    "topURLs": {
    "terms": {
    "field": "url",
    "size": 10,
    "order": {
    "pvCount": "desc"
    }
    },
    "aggs": {
    "pvCount": {
    "sum": {
    "field" : "pv"
    }
    }
    }
    }
    }
    }
    }
    }

I hope this helps clarify some of your options,

Regards,

Colin

On Saturday, 28 June 2014 05:02:51 UTC+2, Helennie Nie wrote:

Hi there

I have some data like :
{
"url":"xxx",
"pv":n,
"date":Date
}
url is a string and pv is an integer, what I want is to sum pv of an url
form one date to another date, what aggregation should I use?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0b3c312c-00ad-41ad-ac89-852cc824a407%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #3