Discrete value aggregations on a URL field

Hi,

I am trying to find numbers of discrete value per URL in a day and the
result is not what I expect.
So let's say I have an index which contains such document:

{
"date": ...,
"url": ....,
"other"...
}

And basically I am trying to group by url for a particular date:

{
"query":
{
"range":{"date": {"gte":"2014-09-08", "lte":"2014-09-09"}}
},
"aggregations":
{
"mt_agg":
{
"terms": {"field": "url"}
}
}
}

Result is bizarre, I mean it breaks my URL into its segments and aggregates
on that. Do I need to use Hash of the URL (I prefer not to)? Here is the
result:

"aggregations": {
    "shabash": {
        "buckets": [
            {
                "key": "http",
                "doc_count": 903
            },
            {
                "key": "rss",
                "doc_count": 638
            },
            {
                "key": "service",
                "doc_count": 381
            },
            {
                "key": "zzzzzzz.fff",
                "doc_count": 337
            },
            {
                "key": "e",
                "doc_count": 153
            },
            {
                "key": "xxx.com",
                "doc_count": 153
            },
            {
                "key": "www.yyy",
                "doc_count": 153
            },
            {
                "key": "fa",
                "doc_count": 127
            },
            {
                "key": "feed",
                "doc_count": 119
            },
            {
                "key": "www.nnnnnnn.com",
                "doc_count": 71
            }
        ]
    }
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ac784f35-d8ee-4fe5-979f-de1ca7446da0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

OK, it seems that I need to use not_analyzed on the field. Is that correct?

On Friday, 12 September 2014 08:18:19 UTC+1, Ali Kheyrollahi wrote:

Hi,

I am trying to find numbers of discrete value per URL in a day and the
result is not what I expect.
So let's say I have an index which contains such document:

{
"date": ...,
"url": ....,
"other"...
}

And basically I am trying to group by url for a particular date:

{
"query":
{
"range":{"date": {"gte":"2014-09-08", "lte":"2014-09-09"}}
},
"aggregations":
{
"mt_agg":
{
"terms": {"field": "url"}
}
}
}

Result is bizarre, I mean it breaks my URL into its segments and
aggregates on that. Do I need to use Hash of the URL (I prefer not to)?
Here is the result:

"aggregations": {
    "shabash": {
        "buckets": [
            {
                "key": "http",
                "doc_count": 903
            },
            {
                "key": "rss",
                "doc_count": 638
            },
            {
                "key": "service",
                "doc_count": 381
            },
            {
                "key": "zzzzzzz.fff",
                "doc_count": 337
            },
            {
                "key": "e",
                "doc_count": 153
            },
            {
                "key": "xxx.com",
                "doc_count": 153
            },
            {
                "key": "www.yyy",
                "doc_count": 153
            },
            {
                "key": "fa",
                "doc_count": 127
            },
            {
                "key": "feed",
                "doc_count": 119
            },
            {
                "key": "www.nnnnnnn.com",
                "doc_count": 71
            }
        ]
    }
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e453b450-3329-476c-9102-852af3180745%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

On Friday, September 12, 2014 at 09:23 CEST,
Ali Kheyrollahi aliostad@gmail.com wrote:

On Friday, 12 September 2014 08:18:19 UTC+1, Ali Kheyrollahi wrote:

I am trying to find numbers of discrete value per URL in a day and
the result is not what I expect.

[...]

Result is bizarre, I mean it breaks my URL into its segments
and aggregates on that. Do I need to use Hash of the URL (I prefer
not to)?

OK, it seems that I need to use not_analyzed on the field. Is that
correct?

Yes.

--
Magnus Bäck | Software Engineer, Development Tools
magnus.back@sonymobile.com | Sony Mobile Communications

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/20140912085425.GA9172%40seldlx20533.corpusers.net.
For more options, visit https://groups.google.com/d/optout.