Aggregation on nested document always takes 2-3 seconds?


(Luke Scott) #1

I have an index that uses 1 level of nested documents. When I run a query
on it the result comes back in about 20-200 milliseconds. When I add a
facet or an aggregation involving the nested documents the uncached
response always takes 2-3 seconds, regardless of how many documents have
been selected, even zero.

My map looks like this:

{
"document": {
"dynamic": "strict",
"properties": {
"account_id": {
"type": "long"
},
"data": {
"type": "nested",
"properties": {
"key": {
"type": "string",
"index": "not_analyzed"
},
"string": {
"type": "string",
"index": "not_analyzed",
"fields": {
"token": {
"type": "string"
}
}
},
"integer": {
"type": "long"
},
"date": {
"type": "date",
"format": "dateOptionalTime"
}
}
}
}
}
}

There are 3.6 million documents in this index. My query looks like this:

{
"query": {
"bool":{
"must":[
{"term":{"account_id": 1}},
{
"nested":{
"path":"data",
"query":{"term":{"key":"amount"}}
}
}
]
}
}
}

The result to the above query is 0 documents because account_id 1 doesn't
have any documents with a key of "amount". Uncached this returns in about
10-150ms:

{
"took": 9,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}

When I add an aggregation to the query:

{
...
"aggs" : {
"report" : {
"nested" : {
"path" : "data"
},
"aggs" : {
"amount" : {
"filter" : {
"query": {"term": {"key":"amount"}}
},
"aggs": {
"sum": {
"sum" : { "field" : "integer" }
}
}
}
}
}
}
}

Uncached the query returns in about 2-3 seconds:

{
"took": 2770,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
},
"aggregations": {
"report": {
"doc_count": 0,
"amount": {
"doc_count": 0,
"sum": {
"value": 0
}
}
}
}
}

If I run the same thing a second time (cached) it runs in 26 milliseconds.
If I clear the cache and run it again it takes 2 seconds.

Why is this aggregation always taking 2-3 seconds, even though the query is
returning 0 documents? The same thing happens with a statistical facet.

Luke

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a82323a6-9a81-436b-a2d2-cc26e918cb7c%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Adrien Grand) #2

Very likely this problem is not related to nested documents but to
fielddata loading because of the "integer" field. Field data is a
column-oriented view of the data that is, by default, lazily loaded from
the inverted index on the first time that it is needed, and then cached
until the end of life of the segment it belongs to. So only the first
request that needs it is supposed to be slow.

It is possible to load field data eagerly[1] in order to make sure that
field data loading is never going to impact response times. This way you
should not get such slow response times on the first queries.

Another option would be to use doc values[2] that will store field data on
disk instead of loading it from the inverted index. Since data will already
be stored in a column-oriented way, there will be no need to uninvert data
from the inverted index (which is costly and probably the reason of your
slow queries).

[1]
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/fielddata-formats.html#_fielddata_loading
[2]
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/fielddata-formats.html#_numeric_field_data_types

On Thu, Feb 13, 2014 at 7:34 PM, Luke Scott luke@visionlaunchers.comwrote:

I have an index that uses 1 level of nested documents. When I run a query
on it the result comes back in about 20-200 milliseconds. When I add a
facet or an aggregation involving the nested documents the uncached
response always takes 2-3 seconds, regardless of how many documents have
been selected, even zero.

My map looks like this:

{
"document": {
"dynamic": "strict",
"properties": {
"account_id": {
"type": "long"
},
"data": {
"type": "nested",
"properties": {
"key": {
"type": "string",
"index": "not_analyzed"
},
"string": {
"type": "string",
"index": "not_analyzed",
"fields": {
"token": {
"type": "string"
}
}
},
"integer": {
"type": "long"
},
"date": {
"type": "date",
"format": "dateOptionalTime"
}
}
}
}
}
}

There are 3.6 million documents in this index. My query looks like this:

{
"query": {
"bool":{
"must":[
{"term":{"account_id": 1}},
{
"nested":{
"path":"data",
"query":{"term":{"key":"amount"}}
}
}
]
}
}
}

The result to the above query is 0 documents because account_id 1 doesn't
have any documents with a key of "amount". Uncached this returns in about
10-150ms:

{
"took": 9,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}

When I add an aggregation to the query:

{
...
"aggs" : {
"report" : {
"nested" : {
"path" : "data"
},
"aggs" : {
"amount" : {
"filter" : {
"query": {"term": {"key":"amount"}}
},
"aggs": {
"sum": {
"sum" : { "field" : "integer" }
}
}
}
}
}
}
}

Uncached the query returns in about 2-3 seconds:

{
"took": 2770,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
},
"aggregations": {
"report": {
"doc_count": 0,
"amount": {
"doc_count": 0,
"sum": {
"value": 0
}
}
}
}
}

If I run the same thing a second time (cached) it runs in 26 milliseconds.
If I clear the cache and run it again it takes 2 seconds.

Why is this aggregation always taking 2-3 seconds, even though the query
is returning 0 documents? The same thing happens with a statistical facet.

Luke

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a82323a6-9a81-436b-a2d2-cc26e918cb7c%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j5RsKkENHLXtCN-BEizgi6jwci_Ed8SU%3Dsu5i8hGVHa0w%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #3