Bad performance on aggregations

Hello,

I have a problem with the performance of aggregations: The time of the
aggregation is very worst.

I'm doing the next aggregation over an index with 160M documents (16G of
data).

{
"query": {
"filtered": {
"filter": {
"range": {
"_cache": false,
"insert_date": {
"gte": 1424790449432
}
}
}
}
},
"aggs": {
"tag": {
"terms": {
"field": "origin_ip"
}
}
}
}

Time: 18s. No results found (The result is correct. There are no documents
with insert_date greater than 1424790449432)

However if I'm doing the next search:
{
"query": {
"filtered": {
"filter": {
"range": {
"_cache": false,
"insert_date": {
"gte": 1424790449432
}
}
}
}
}
}

Time: 7ms . No results found. (As I already wrote, the result is correct).

What is happening?

In documentation
(http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_filtered_query.html),
it is written :"The query (which happens to include a filter) returns a
certain subset of documents, and the aggregation operates on those
documents."

In my situation, there are no elements in the subset of documents returned
by the filter, so the aggregation should run in the same amount of time
like the search.

So, how can I improve the performance of that aggregation?

Thank you,

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1d1d559a-7ebe-435f-be9c-5dd89528eb2d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

BTW, I'm running ES 1.4.2 with Java 7

On Tuesday, February 24, 2015 at 5:28:50 PM UTC+2, Octavian wrote:

Hello,

I have a problem with the performance of aggregations: The time of the
aggregation is very worst.

I'm doing the next aggregation over an index with 160M documents (16G of
data).

{
"query": {
"filtered": {
"filter": {
"range": {
"_cache": false,
"insert_date": {
"gte": 1424790449432
}
}
}
}
},
"aggs": {
"tag": {
"terms": {
"field": "origin_ip"
}
}
}
}

Time: 18s. No results found (The result is correct. There are no documents
with insert_date greater than 1424790449432)

However if I'm doing the next search:
{
"query": {
"filtered": {
"filter": {
"range": {
"_cache": false,
"insert_date": {
"gte": 1424790449432
}
}
}
}
}
}

Time: 7ms . No results found. (As I already wrote, the result is correct).

What is happening?

In documentation (
Elasticsearch Platform — Find real-time answers at scale | Elastic),
it is written :"The query (which happens to include a filter) returns a
certain subset of documents, and the aggregation operates on those
documents."

In my situation, there are no elements in the subset of documents returned
by the filter, so the aggregation should run in the same amount of time
like the search.

So, how can I improve the performance of that aggregation?

Thank you,

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/62bfec72-b9b8-4c18-b78d-18bd6f211ab2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hello,

Can anybody help me on this problem? Is this a known bug in Elasticsearch?

Thank you

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e1bf4a18-77d0-4ba8-a3b1-4832494a6050%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

What is the Elasticsearch JAVA params (like heap size)? Try using range
date query

.

2015-03-04 8:20 GMT-03:00 Octavian octavian.rinciog@gmail.com:

Hello,

Can anybody help me on this problem? Is this a known bug in Elasticsearch?

Thank you

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/e1bf4a18-77d0-4ba8-a3b1-4832494a6050%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/e1bf4a18-77d0-4ba8-a3b1-4832494a6050%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
Regards,

Sávio S. Teles de Oliveira

Co-Founder & Software Engineer at www.gogeo.io.
PHD student in Computer Science focusing on High Performance Maps Platform
and Spatial Algorithms.
voice: +55 62 9136 6996
http://br.linkedin.com/in/savioteles
https://twitter.com/savioteless

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAFKmhPuC-UHQmpKboEDzLz0TFra8U%2Bzng5zgbzT_A2RoCEgDNA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

What do the hot threads look like while the query is running?

On Tue, Feb 24, 2015 at 4:28 PM, Octavian octavian.rinciog@gmail.com
wrote:

Hello,

I have a problem with the performance of aggregations: The time of the
aggregation is very worst.

I'm doing the next aggregation over an index with 160M documents (16G of
data).

{
"query": {
"filtered": {
"filter": {
"range": {
"_cache": false,
"insert_date": {
"gte": 1424790449432
}
}
}
}
},
"aggs": {
"tag": {
"terms": {
"field": "origin_ip"
}
}
}
}

Time: 18s. No results found (The result is correct. There are no documents
with insert_date greater than 1424790449432)

However if I'm doing the next search:
{
"query": {
"filtered": {
"filter": {
"range": {
"_cache": false,
"insert_date": {
"gte": 1424790449432
}
}
}
}
}
}

Time: 7ms . No results found. (As I already wrote, the result is correct).

What is happening?

In documentation (
Elasticsearch Platform — Find real-time answers at scale | Elastic),
it is written :"The query (which happens to include a filter) returns a
certain subset of documents, and the aggregation operates on those
documents."

In my situation, there are no elements in the subset of documents returned
by the filter, so the aggregation should run in the same amount of time
like the search.

So, how can I improve the performance of that aggregation?

Thank you,

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/1d1d559a-7ebe-435f-be9c-5dd89528eb2d%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/1d1d559a-7ebe-435f-be9c-5dd89528eb2d%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j4S6AwmgjY-T9u5EaH9RoL%2B-A3JHucAMYtpYyqFAHp50w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.