Statistical Facet - Does it only aggregate distinct numerical values


(Tim Butterfield) #1

So I have an index containing a 82 documents like the following (sample
data to familiarise me with the api).

{"_index":"foo","_type":"bar","_id":"3675293","_score":1.0, "_source" : {
"companyId": 3675293,
"companyName": "ABC",
"companyOwner": {
"ownerFirstName": "Billy",
"ownerLastName": "Jean"
},
"created": "2013-05-22T13:45:49",
"accounts": [
{
"year": 2013,
"amount": 1.0,
"currency": "GBP"
},
{
"year": 2012,
"amount": 1.0,
"currency": "GBP"
}
]
}}

The problem I have is that I'm trying to total all of the amounts. When I
use a Statistical Facet via the NEST .Net client I get an answer I wouldn't
expect.

82 documents, each with a collection containing 2 amount of £1 in my mind
aggregates to a total of 2 X 82 = 164. However, the facet returns a total
of 82. When I change one of the amounts in each of the two documents to be
2 instead of 1, I get the correct result.

Does Elastic Search only aggregate unique numerical values per document?
Can this be overriden?

The above isn't the real world example, but the collection will contain a
monetary field which I need to aggregate for all matching documents, and
the amounts in the collection for one or more items may match. I need to
aggreate everything, not just distinct values. Is this possible?

Incidentally I'm using the NEST .Net client if that has any bearing.

Thanks

Regards

Tim

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Boaz Leskes) #2

Hi Tim,

By default ES "collapses" all inner document when indexing them. It creates
indexes which are "namespaced" according to your document structure. For
example, for this document you will have an index which is called
"accounts.currency" with the value "GBP" for this document. This is done
because it results in the best search & indexing performance but it does
have the downside of loosing internal arrayed objects, which causes only
distinct values to be stored.

Normally, this is solved by indicating that the accounts field should have
the type "nested"
(see http://www.elasticsearch.org/guide/reference/mapping/nested-type/ ).
Sadly, I just discovered a problem with nested types and the statistical
facet. I opened an issue here:
https://github.com/elasticsearch/elasticsearch/issues/3209 so it will be
fixed.

Cheers,
Boaz

On Thursday, June 20, 2013 12:26:39 AM UTC+2, Tim Butterfield wrote:

So I have an index containing a 82 documents like the following (sample
data to familiarise me with the api).

{"_index":"foo","_type":"bar","_id":"3675293","_score":1.0, "_source" : {
"companyId": 3675293,
"companyName": "ABC",
"companyOwner": {
"ownerFirstName": "Billy",
"ownerLastName": "Jean"
},
"created": "2013-05-22T13:45:49",
"accounts": [
{
"year": 2013,
"amount": 1.0,
"currency": "GBP"
},
{
"year": 2012,
"amount": 1.0,
"currency": "GBP"
}
]
}}

The problem I have is that I'm trying to total all of the amounts. When I
use a Statistical Facet via the NEST .Net client I get an answer I wouldn't
expect.

82 documents, each with a collection containing 2 amount of £1 in my mind
aggregates to a total of 2 X 82 = 164. However, the facet returns a total
of 82. When I change one of the amounts in each of the two documents to be
2 instead of 1, I get the correct result.

Does Elastic Search only aggregate unique numerical values per document?
Can this be overriden?

The above isn't the real world example, but the collection will contain a
monetary field which I need to aggregate for all matching documents, and
the amounts in the collection for one or more items may match. I need to
aggreate everything, not just distinct values. Is this possible?

Incidentally I'm using the NEST .Net client if that has any bearing.

Thanks

Regards

Tim

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Boaz Leskes) #3

Hi Tim,

A colleague pointed out that the source of the issue is I forgot to
indicate that the facet should run on nested documents.

To re-iterated : you should mark "accounts" as a nested object in your
mappings, then you can facet like this:

curl -XPOST "http://localhost:9200/bar/_search" -d'
{
"facets": {
"test": {
"statistical": {
"field": "amount"
},
"nested": "accounts"
}
},
"size":0
}'

More info about faceting and nested objects is to be found here:
http://www.elasticsearch.org/guide/reference/api/search/facets/ (search for
nested) .

Cheers,
Boaz

On Thursday, June 20, 2013 9:48:08 AM UTC+2, Boaz Leskes wrote:

Hi Tim,

By default ES "collapses" all inner document when indexing them. It
creates indexes which are "namespaced" according to your document
structure. For example, for this document you will have an index which is
called "accounts.currency" with the value "GBP" for this document. This is
done because it results in the best search & indexing performance but it
does have the downside of loosing internal arrayed objects, which causes
only distinct values to be stored.

Normally, this is solved by indicating that the accounts field should have
the type "nested" (see
http://www.elasticsearch.org/guide/reference/mapping/nested-type/ ).
Sadly, I just discovered a problem with nested types and the statistical
facet. I opened an issue here:
https://github.com/elasticsearch/elasticsearch/issues/3209 so it will be
fixed.

Cheers,
Boaz

On Thursday, June 20, 2013 12:26:39 AM UTC+2, Tim Butterfield wrote:

So I have an index containing a 82 documents like the following (sample
data to familiarise me with the api).

{"_index":"foo","_type":"bar","_id":"3675293","_score":1.0, "_source" : {
"companyId": 3675293,
"companyName": "ABC",
"companyOwner": {
"ownerFirstName": "Billy",
"ownerLastName": "Jean"
},
"created": "2013-05-22T13:45:49",
"accounts": [
{
"year": 2013,
"amount": 1.0,
"currency": "GBP"
},
{
"year": 2012,
"amount": 1.0,
"currency": "GBP"
}
]
}}

The problem I have is that I'm trying to total all of the amounts. When I
use a Statistical Facet via the NEST .Net client I get an answer I wouldn't
expect.

82 documents, each with a collection containing 2 amount of £1 in my mind
aggregates to a total of 2 X 82 = 164. However, the facet returns a total
of 82. When I change one of the amounts in each of the two documents to be
2 instead of 1, I get the correct result.

Does Elastic Search only aggregate unique numerical values per document?
Can this be overriden?

The above isn't the real world example, but the collection will contain a
monetary field which I need to aggregate for all matching documents, and
the amounts in the collection for one or more items may match. I need to
aggreate everything, not just distinct values. Is this possible?

Incidentally I'm using the NEST .Net client if that has any bearing.

Thanks

Regards

Tim

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jimmy Thomas) #4

Is it possible to filter the nested documents used in the statistical
counts ?

In this case for example, how could I get statistics only for the accounts
matching year 2013 ?

I tried setting a filter as a facet_filter but it's only filtering root
documents, not used nested docs.

Thanks
Regards
Jimmy

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #5