Using ES for web analytics


(Jan van Vlimmeren) #1

Hi everyone,

I'm brand new to ES and am trying to use it to create a basic analytics app.

I'm running into a problem that I can't seem to get sorted out myself:

I have documents like this:
{
_index: hpstats
_type: articles
_id: http://www.standaard.be/cnt/dmf20140321_01034888-2014-03-21-11-07
_version: 10
_score: 1
_source: {
url: http://www.standaard.be/cnt/dmf20140321_01034888
count: 2
title: Busverkeer Vlaams-Brabant verstoord door staking in Asse
created: 2014-03-21T11:07:33+00:00
lastview: 2014-03-21T11:07:58+00:00
views: 9
site: standaard.be
globalviews: 1}}

For each url, a new document is created every minute that gathers the
count, views and globalviews for that url during that minute. What I want
is for each url the lifetime count, views and globalviews. I tried using

{
"aggs": {
"urls": {
"terms": {
"field": "url"
},
"aggs": {
"count": {
"sum": {
"field": "count"
}
},
"views": {
"sum": {
"field": "views"
}
},
"globalviews": {
"sum": {
"field": "globalviews"
}
}
}
}
}
}

Unfortunately this returns odd results. I would expect to see each unique
url but that's not what happens, I get the following;

aggregations: {

  • urls: {
    • buckets: [
      • {
        • key: http
        • doc_count: 24503
        • count: {
          • value: 56458
            }
        • globalviews: {
          • value: 608164
            }
        • views: {
          • value: 2952759
            }
            }
      • {
        • key: www.standaard.be
        • doc_count: 14018
        • count: {
          • value: 45973
            }
        • globalviews: {
          • value: 320963
            }
        • views: {
          • value: 2679508
            }
            }
      • {
        • key: cnt
        • doc_count: 9172
        • count: {
          • value: 41127
            }
        • globalviews: {
          • value: 216736
            }
        • views: {
          • value: 1416645
            }
            }
      • {
        • key: utm_campaign
        • doc_count: 8371
        • count: {
          • value: 8371
            }
        • globalviews: {
          • value: 228334
            }
        • views: {
          • value: 172170
            }
            }
      • {
        • key: utm_medium
        • doc_count: 8371
        • count: {
          • value: 8371
            }
        • globalviews: {
          • value: 228334
            }
        • views: {
          • value: 172170
            }
            }
      • {
        • key: utm_source
        • doc_count: 8371
        • count: {
          • value: 8371
            }
        • globalviews: {
          • value: 228334
            }
        • views: {
          • value: 172170
            }
            }
      • {
        • key: standaard
        • doc_count: 8305
        • count: {
          • value: 8305
            }
        • globalviews: {
          • value: 226994
            }
        • views: {
          • value: 172098
            }
            }
      • {
        • key: utm_term
        • doc_count: 7190
        • count: {
          • value: 7190
            }
        • globalviews: {
          • value: 197773
            }
        • views: {
          • value: 63153
            }
            }
      • {
        • key: article
        • doc_count: 6706
        • count: {
          • value: 6706
            }
        • globalviews: {
          • value: 182001
            }
        • views: {
          • value: 47291
            }
            }
      • {
        • key: crosspromoreg
        • doc_count: 6684
        • count: {
          • value: 6684
            }
        • globalviews: {
          • value: 181921
            }
        • views: {
          • value: 47269
            }
            }
            ]
            }

Anyone have an idea how I can get the results I would expect?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6f56b437-0583-442e-af05-a4b29cfd9999%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Jan van Vlimmeren) #2

And of course then you come up with the answer and you feel like a complete
idiot. Just in case anyone else runs into the issue:

You need to specify in the mapping that the URL field should be
not_analyzed ("index": "not_analyzed")

On Friday, March 21, 2014 1:33:27 PM UTC+1, Jan van Vlimmeren wrote:

Hi everyone,

I'm brand new to ES and am trying to use it to create a basic analytics
app.

I'm running into a problem that I can't seem to get sorted out myself:

I have documents like this:
{
_index: hpstats
_type: articles
_id: http://www.standaard.be/cnt/dmf20140321_01034888-2014-03-21-11-07
_version: 10
_score: 1
_source: {
url: http://www.standaard.be/cnt/dmf20140321_01034888
count: 2
title: Busverkeer Vlaams-Brabant verstoord door staking in Asse
created: 2014-03-21T11:07:33+00:00
lastview: 2014-03-21T11:07:58+00:00
views: 9
site: standaard.be
globalviews: 1}}

For each url, a new document is created every minute that gathers the
count, views and globalviews for that url during that minute. What I want
is for each url the lifetime count, views and globalviews. I tried using

{
"aggs": {
"urls": {
"terms": {
"field": "url"
},
"aggs": {
"count": {
"sum": {
"field": "count"
}
},
"views": {
"sum": {
"field": "views"
}
},
"globalviews": {
"sum": {
"field": "globalviews"
}
}
}
}
}
}

Unfortunately this returns odd results. I would expect to see each unique
url but that's not what happens, I get the following;

aggregations: {

  • urls: {
    • buckets: [
      • {
        • key: http
        • doc_count: 24503
        • count: {
          • value: 56458
            }
        • globalviews: {
          • value: 608164
            }
        • views: {
          • value: 2952759
            }
            }
      • {
        • key: www.standaard.be
        • doc_count: 14018
        • count: {
          • value: 45973
            }
        • globalviews: {
          • value: 320963
            }
        • views: {
          • value: 2679508
            }
            }
      • {
        • key: cnt
        • doc_count: 9172
        • count: {
          • value: 41127
            }
        • globalviews: {
          • value: 216736
            }
        • views: {
          • value: 1416645
            }
            }
      • {
        • key: utm_campaign
        • doc_count: 8371
        • count: {
          • value: 8371
            }
        • globalviews: {
          • value: 228334
            }
        • views: {
          • value: 172170
            }
            }
      • {
        • key: utm_medium
        • doc_count: 8371
        • count: {
          • value: 8371
            }
        • globalviews: {
          • value: 228334
            }
        • views: {
          • value: 172170
            }
            }
      • {
        • key: utm_source
        • doc_count: 8371
        • count: {
          • value: 8371
            }
        • globalviews: {
          • value: 228334
            }
        • views: {
          • value: 172170
            }
            }
      • {
        • key: standaard
        • doc_count: 8305
        • count: {
          • value: 8305
            }
        • globalviews: {
          • value: 226994
            }
        • views: {
          • value: 172098
            }
            }
      • {
        • key: utm_term
        • doc_count: 7190
        • count: {
          • value: 7190
            }
        • globalviews: {
          • value: 197773
            }
        • views: {
          • value: 63153
            }
            }
      • {
        • key: article
        • doc_count: 6706
        • count: {
          • value: 6706
            }
        • globalviews: {
          • value: 182001
            }
        • views: {
          • value: 47291
            }
            }
      • {
        • key: crosspromoreg
        • doc_count: 6684
        • count: {
          • value: 6684
            }
        • globalviews: {
          • value: 181921
            }
        • views: {
          • value: 47269
            }
            }
            ]
            }

Anyone have an idea how I can get the results I would expect?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b80cfced-de19-40f6-95e2-501d0a1a7d83%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #3