Concerns on possible load of aggregation

We are running a PAAS built with elasticsearch and we want to provide
multi-column count aggregation feature through ES aggregation

Let's take below as an example

POST /INDEX_PATTREN-*/_search
{
"query":{"match":{"project":"dummyProject"}},
"size":0,
"aggs": {
"col1": {
"terms": {
"field": "host",
"size":5
},
"aggs": {
"col2": {
"terms": {
"field": "source",
"size":5
},
"aggs":{
"col3":{
"terms":{
"field":"version",
"size":5
}
}
}
}
}
}
}
}

We use daily index, stores 30 days amount of data, approximately 500GB per
day index.

So the example aggreagation will investigate huge data.

But we found out that it's blazingly fast, we use 20 data nodes together
with several search/master nodes, and it responds within 10 minutes.

OK, but what if there's many request at the same time, what can happen?

Will those requests just make other requests to slow down(in this case,
increase # of machines will be a solution?) or possibly cause OOM or
whatever critical error on ES daemon?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL3_U43m1UuZbAHPwSNzQHC-xpBxGsr%2B%3DGNt-GUeMCueoyTP0w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

You need to look into using an index template that uses optimal mapping for
your data. For logstash, it really helps to use doc_values on all fields
you aggregate on and turning off norms as well on those fields. Doc_values
means elasticsearch uses memory mapped files instead of heap memory for the
field values. WIth huge aggregations this means the system will get slower
but less likely to run out of memory if you get a lot of requests. Without
doc_values, you will want to configure field data circuitbreakers properly
to ensure you don't run out of memory. This typically means that searches
that would have run out of memory abort with an error instead, which is
preferable to your cluster crashing but not great from an end user
perspective.

Jilles

On Wednesday, February 25, 2015 at 9:09:43 AM UTC+1, Seungjin Lee wrote:

We are running a PAAS built with elasticsearch and we want to provide
multi-column count aggregation feature through ES aggregation

Let's take below as an example

POST /INDEX_PATTREN-*/_search
{
"query":{"match":{"project":"dummyProject"}},
"size":0,
"aggs": {
"col1": {
"terms": {
"field": "host",
"size":5
},
"aggs": {
"col2": {
"terms": {
"field": "source",
"size":5
},
"aggs":{
"col3":{
"terms":{
"field":"version",
"size":5
}
}
}
}
}
}
}
}

We use daily index, stores 30 days amount of data, approximately 500GB per
day index.

So the example aggreagation will investigate huge data.

But we found out that it's blazingly fast, we use 20 data nodes together
with several search/master nodes, and it responds within 10 minutes.

OK, but what if there's many request at the same time, what can happen?

Will those requests just make other requests to slow down(in this case,
increase # of machines will be a solution?) or possibly cause OOM or
whatever critical error on ES daemon?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/745a95f9-d963-472c-9ece-f326521707b5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.