Aggregation on last element

Hi,

I have a type whose data looks like this:

{
"date": "2014-01-01"
"element": "abc",
"type": "A"
},
{
"date": "2014-01-02"
"element": "abc",
"type": "B"
},
{
"date": "2014-01-03"
"element": "def",
"type": "A"
}

I'd like to be able to group the data by element, and count the documents
where the LAST document by date have a type of A. In this case, I want the
result to be "1" (because the second document, that has the same element as
the first document, has a date that is after the first document, but as its
type is not B, I don't want it to be counted ; for the last document, it is
the only one with element "def" and the type is A).

I'm not sure this is even possible. Please note that the cardinality of
"element" can be quite high (up to 20 000 different values).

Thank you in advance!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/46509869-4afa-4062-8c34-ad828dcf680c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hello Michaël ,

I cant think of a way to do this in a single call.
May be you should try the following

(Terms aggregation on element) -> (Top N hits aggregation , sort by date by
asc and size = 1 ) -> (Filter aggregation by type A)
With this you will get the elements that you are looking for. Now do a
filter on those elements and a terms aggregation query on element filed to
get the results.

Thanks
Vineeth

On Sun, Oct 26, 2014 at 1:04 AM, Michaël Gallego michael@maestrooo.com
wrote:

Hi,

I have a type whose data looks like this:

{
"date": "2014-01-01"
"element": "abc",
"type": "A"
},
{
"date": "2014-01-02"
"element": "abc",
"type": "B"
},
{
"date": "2014-01-03"
"element": "def",
"type": "A"
}

I'd like to be able to group the data by element, and count the documents
where the LAST document by date have a type of A. In this case, I want the
result to be "1" (because the second document, that has the same element as
the first document, has a date that is after the first document, but as its
type is not B, I don't want it to be counted ; for the last document, it is
the only one with element "def" and the type is A).

I'm not sure this is even possible. Please note that the cardinality of
"element" can be quite high (up to 20 000 different values).

Thank you in advance!

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/46509869-4afa-4062-8c34-ad828dcf680c%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/46509869-4afa-4062-8c34-ad828dcf680c%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGdPd5mGGdrY-K3maf4H0QeGuDjS-GUTCbV3MSxdE62wdMYpyA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Hi Vineeth,

I'm afraid that this won't work, because as I said "element" can have high
cardinality (while it's not bounded in theory, in practice it will range
from 500 to 40000). Therefore if I do a "terms" on element, then a top hit,
it will require to generate maybe 40000 sub-buckets. I think this will kill
performance.

For now, I've rethought my format so it now looks like this:

{
"element": "abc",
"history": [
{"type": "A", "date": "2014-01-01"},
{"type": "B", "date": "2014-01-02"}
]
}

Where history is mapped as nested. Now, I can do that:

{
"aggs": {
"history": {
"nested": {
"path": "history"
},
"aggs": {
"latest-history": {
"filter": {
"limit": {
"value": 1
}
},

      "aggs": {
        "by-type": {
          "terms": {
            "field": "history.type",
            "size": 0
          }
        }
      }
    }
  }
}

}
}

This will get the nested history, limit by 1, then group by type, so I can
get the count of the ones I'm interested (A type or B type). The only
drawback is that inside the history nested, I need to sort the history by
date in my application (I have not found any way to sort the nested by date
before doing the limit filter...), and that while history is typically
quite low (around 10-200 elements), it is not bounded, and updating is
harder to do...

If anyone has any other idea, don't hesitate to share!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3948211d-2029-42f4-a07a-3ff0ba1834c7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

After some testing, it appears that my solution does not work, but I'm not
sure to understand why. The filter returns less result that what is
expected.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/933d228e-82f1-47c4-9fc3-909de234b93b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.