Need help, multiple aggregations with filters extremely slow, where to look for optimizations?

Thomas_Bolis · June 13, 2014, 7:09am

Hi,

I'm facing a performance issue with some aggregations I perform, and I need
your help if possible:

I have to documents, the request and the event. The request is the
parent of the event. Below is a (sample) mapping

"event" : {
"dynamic" : "strict",
"_parent" : {
"type" : "request"
},
"properties" : {
"event_time" : {
"format" : "dateOptionalTime",
"type" : "date"
},
"count" : {
"type" : "integer"
},
"event" : {
"index" : "not_analyzed",
"type" : "string"
}
}
}

"request" : {
"dynamic" : "strict",
"_id" : {
"path" : "uniqueId"
},
"properties" : {
"uniqueId" : {
"index" : "not_analyzed",
"type" : "string"
},
"user" : {
"index" : "not_analyzed",
"type" : "string"
},
"code" : {
"type" : "integer"
},
"country" : {
"index" : "not_analyzed",
"type" : "string"
},
"city" : {
"index" : "not_analyzed",
"type" : "string"
}
....
}
}

My cluster is becoming really big (almost 2 TB of data with billions of
documents) and i maintain one index per day, whereas I occasionally delete
old indices. My daily index is about 20GB big. The version of elasticsearch
that I use is 1.1.1.

My problems start when I want to get some aggregations of events with some
criteria which is applied in the parent request document. For example count
be the events of type click for country = US and code=12. What I was
initially doing was to generate a scriptFilter for the request document (in
Groovy) and I was adding multiple aggregations in one search request. This
ended up being very slow so I removed the scripting logic and I supported
my logic with java code.

What seems to be initially solved in my local machine, when I got back to
the cluster, nothing has changed. Again my app performs really really poor.
I get more than 10 seconds to perform a search with ~10 sub-aggregations.

What seems strange is that I notice that the cluster is pretty ok with
regards load average, CPU etc.

Any hints on where to look for solving this out? to be able to identify the
bottleneck

Ask for any additional information to provide, I didn't want to make this
post too long to read
Thank you

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8134f5b0-f947-406f-ab57-c44c6c82ce66%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jpountz · June 13, 2014, 7:32am

Can you show us what your request looks like? (including query and aggs)

On Fri, Jun 13, 2014 at 9:09 AM, Thomas thomas.bolis@gmail.com wrote:

Hi,

I'm facing a performance issue with some aggregations I perform, and I
need your help if possible:

I have to documents, the request and the event. The request is the
parent of the event. Below is a (sample) mapping

"event" : {
"dynamic" : "strict",
"_parent" : {
"type" : "request"
},
"properties" : {
"event_time" : {
"format" : "dateOptionalTime",
"type" : "date"
},
"count" : {
"type" : "integer"
},
"event" : {
"index" : "not_analyzed",
"type" : "string"
}
}
}

"request" : {
"dynamic" : "strict",
"_id" : {
"path" : "uniqueId"
},
"properties" : {
"uniqueId" : {
"index" : "not_analyzed",
"type" : "string"
},
"user" : {
"index" : "not_analyzed",
"type" : "string"
},
"code" : {
"type" : "integer"
},
"country" : {
"index" : "not_analyzed",
"type" : "string"
},
"city" : {
"index" : "not_analyzed",
"type" : "string"
}
....
}
}

My cluster is becoming really big (almost 2 TB of data with billions of
documents) and i maintain one index per day, whereas I occasionally delete
old indices. My daily index is about 20GB big. The version of elasticsearch
that I use is 1.1.1.

My problems start when I want to get some aggregations of events with some
criteria which is applied in the parent request document. For example count
be the events of type click for country = US and code=12. What I was
initially doing was to generate a scriptFilter for the request document (in
Groovy) and I was adding multiple aggregations in one search request. This
ended up being very slow so I removed the scripting logic and I supported
my logic with java code.

What seems to be initially solved in my local machine, when I got back to
the cluster, nothing has changed. Again my app performs really really poor.
I get more than 10 seconds to perform a search with ~10 sub-aggregations.

What seems strange is that I notice that the cluster is pretty ok with
regards load average, CPU etc.

Any hints on where to look for solving this out? to be able to identify
the bottleneck

Ask for any additional information to provide, I didn't want to make
this post too long to read
Thank you

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/8134f5b0-f947-406f-ab57-c44c6c82ce66%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/8134f5b0-f947-406f-ab57-c44c6c82ce66%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j5mt_vb_9kSNGTnkYUZruN_wiuT5K5OpOxJhtq1x%3DEFmQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Thomas_Bolis · June 13, 2014, 7:41am

Below is an example aggregation i perform, is there any optimizations I can
perform? Maybe disabling some features i do not need etc.

curl -XPOST
"http://localhost:9200/logs-idx.20140613/event/_search?search_type=count" -d
'
{
"aggs": {
"f1": {
"filter": {
"or": [
{
"and": [
{
"has_parent": {
"type": "request",
"filter": {
"and": {
"filters": [
{
"term": {
"country": "US"
}
},
{
"term": {
"city": "NY"
}
},
{
"term": {
"code": 12
}
}
]
}
}
}
},
{
"range": {
"event_time": {
"gte": "2014-06-13T10:00:00",
"lt": "2014-06-13T11:00:00"
}
}
}
]
},
{
"and": [
{
"has_parent": {
"type": "request",
"filter": {
"and": {
"filters": [
{
"term": {
"country": "US"
}
},
{
"term": {
"city": "NY"
}
},
{
"term": {
"code": 12
}
},
{
"range": {
"request_time": {
"gte": "2014-06-13T10:00:00",
"lt": "2014-06-13T11:00:00"
}
}
}
]
}
}
}
},
{
"range": {
"event_time": {
"lt": "2014-06-13T10:00:00"
}
}
}
]
}
]
},
"aggs": {
"per_interval": {
"date_histogram": {
"field": "event_time",
"interval": "minute"
},
"aggs": {
"metrics": {
"terms": {
"field": "event",
"size": 10
}
}
}
}
}
}
}
}'

On Friday, 13 June 2014 10:09:46 UTC+3, Thomas wrote:

Hi,

I'm facing a performance issue with some aggregations I perform, and I
need your help if possible:

I have to documents, the request and the event. The request is the
parent of the event. Below is a (sample) mapping

"event" : {
"dynamic" : "strict",
"_parent" : {
"type" : "request"
},
"properties" : {
"event_time" : {
"format" : "dateOptionalTime",
"type" : "date"
},
"count" : {
"type" : "integer"
},
"event" : {
"index" : "not_analyzed",
"type" : "string"
}
}
}

"request" : {
"dynamic" : "strict",
"_id" : {
"path" : "uniqueId"
},
"properties" : {
"uniqueId" : {
"index" : "not_analyzed",
"type" : "string"
},
"user" : {
"index" : "not_analyzed",
"type" : "string"
},
"code" : {
"type" : "integer"
},
"country" : {
"index" : "not_analyzed",
"type" : "string"
},
"city" : {
"index" : "not_analyzed",
"type" : "string"
}
....
}
}

My cluster is becoming really big (almost 2 TB of data with billions of
documents) and i maintain one index per day, whereas I occasionally delete
old indices. My daily index is about 20GB big. The version of elasticsearch
that I use is 1.1.1.

My problems start when I want to get some aggregations of events with some
criteria which is applied in the parent request document. For example count
be the events of type click for country = US and code=12. What I was
initially doing was to generate a scriptFilter for the request document (in
Groovy) and I was adding multiple aggregations in one search request. This
ended up being very slow so I removed the scripting logic and I supported
my logic with java code.

What seems to be initially solved in my local machine, when I got back to
the cluster, nothing has changed. Again my app performs really really poor.
I get more than 10 seconds to perform a search with ~10 sub-aggregations.

What seems strange is that I notice that the cluster is pretty ok with
regards load average, CPU etc.

Any hints on where to look for solving this out? to be able to identify
the bottleneck

Ask for any additional information to provide, I didn't want to make
this post too long to read
Thank you

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a4cf00b0-9786-4327-80f9-34941eaf3ca8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jpountz · June 13, 2014, 7:52am

Is this request only about getting aggregations? If so you would probably
get better response times by putting the filter in the query part (under a
filtered query) and only having the date histogram in the aggregation. The
reason is that aggregations are computed on matches, and in case the query
is not specified, that means all documents of your index.

On Fri, Jun 13, 2014 at 9:41 AM, Thomas thomas.bolis@gmail.com wrote:

Below is an example aggregation i perform, is there any optimizations I
can perform? Maybe disabling some features i do not need etc.

curl -XPOST "
http://localhost:9200/logs-idx.20140613/event/_search?search_type=count" -
d'
{
"aggs": {
"f1": {
"filter": {
"or": [
{
"and": [
{
"has_parent": {
"type": "request",
"filter": {
"and": {
"filters": [
{
"term": {
"country": "US"
}
},
{
"term": {
"city": "NY"
}
},
{
"term": {
"code": 12
}
}
]
}
}
}
},
{
"range": {
"event_time": {
"gte": "2014-06-13T10:00:00",
"lt": "2014-06-13T11:00:00"
}
}
}
]
},
{
"and": [
{
"has_parent": {
"type": "request",
"filter": {
"and": {
"filters": [
{
"term": {
"country": "US"
}
},
{
"term": {
"city": "NY"
}
},
{
"term": {
"code": 12
}
},
{
"range": {
"request_time": {
"gte": "2014-06-13T10:00:00",
"lt": "2014-06-13T11:00:00"
}
}
}
]
}
}
}
},
{
"range": {
"event_time": {
"lt": "2014-06-13T10:00:00"
}
}
}
]
}
]
},
"aggs": {
"per_interval": {
"date_histogram": {
"field": "event_time",
"interval": "minute"
},
"aggs": {
"metrics": {
"terms": {
"field": "event",
"size": 10
}
}
}
}
}
}
}
}'

On Friday, 13 June 2014 10:09:46 UTC+3, Thomas wrote:

Hi,

I'm facing a performance issue with some aggregations I perform, and I
need your help if possible:

I have to documents, the request and the event. The request is the
parent of the event. Below is a (sample) mapping

"event" : {
"dynamic" : "strict",
"_parent" : {
"type" : "request"
},
"properties" : {
"event_time" : {
"format" : "dateOptionalTime",
"type" : "date"
},
"count" : {
"type" : "integer"
},
"event" : {
"index" : "not_analyzed",
"type" : "string"
}
}
}

"request" : {
"dynamic" : "strict",
"_id" : {
"path" : "uniqueId"
},
"properties" : {
"uniqueId" : {
"index" : "not_analyzed",
"type" : "string"
},
"user" : {
"index" : "not_analyzed",
"type" : "string"
},
"code" : {
"type" : "integer"
},
"country" : {
"index" : "not_analyzed",
"type" : "string"
},
"city" : {
"index" : "not_analyzed",
"type" : "string"
}
....
}
}

My cluster is becoming really big (almost 2 TB of data with billions of
documents) and i maintain one index per day, whereas I occasionally delete
old indices. My daily index is about 20GB big. The version of elasticsearch
that I use is 1.1.1.

My problems start when I want to get some aggregations of events with
some criteria which is applied in the parent request document. For example
count be the events of type click for country = US and code=12. What I
was initially doing was to generate a scriptFilter for the request document
(in Groovy) and I was adding multiple aggregations in one search request.
This ended up being very slow so I removed the scripting logic and I
supported my logic with java code.

What seems to be initially solved in my local machine, when I got back to
the cluster, nothing has changed. Again my app performs really really poor.
I get more than 10 seconds to perform a search with ~10 sub-aggregations.

What seems strange is that I notice that the cluster is pretty ok with
regards load average, CPU etc.

Any hints on where to look for solving this out? to be able to identify
the bottleneck

Ask for any additional information to provide, I didn't want to make
this post too long to read
Thank you

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a4cf00b0-9786-4327-80f9-34941eaf3ca8%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a4cf00b0-9786-4327-80f9-34941eaf3ca8%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j6S31C%2Bx6AfRQPZ7F%2BZC0zkxchbh6xDeP%3DhiSJZnPDVEg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Thomas_Bolis · June 13, 2014, 8:33am

So I restructured my curl as follows, is this what you mean?, by doing some
first hits i do get some slight improvement, but need to check into
production data:

Thank you will try it and come back with results

curl -XPOST
"http://10.129.2.42:9200/logs-idx.20140613/event/_search?search_type=count"
-d'
{
"query": {
"filtered": {
"filter": {
"or": [
{
"and": [
{
"has_parent": {
"type": "request",
"filter": {
"and": {
"filters": [
{
"term": {
"country": "US"
}
},
{
"term": {
"city": "NY"
}
},
{
"term": {
"code": 12
}
}
]
}
}
}
},
{
"range": {
"event_time": {
"gte": "2014-06-13T10:00:00",
"lt": "2014-06-13T11:00:00"
}
}
}
]
},
{
"and": [
{
"has_parent": {
"type": "request",
"filter": {
"and": {
"filters": [
{
"term": {
"country": "US"
}
},
{
"term": {
"city": "NY"
}
},
{
"term": {
"code": 12
}
},
{
"range": {
"request_time": {
"gte": "2014-06-13T10:00:00",
"lt": "2014-06-13T11:00:00"
}
}
}
]
}
}
}
},
{
"range": {
"event_time": {
"lt": "2014-06-13T10:00:00"
}
}
}
]
}
]
}
}
},
"aggs": {
"per_interval": {
"date_histogram": {
"field": "event_time",
"interval": "minute"
},
"aggs": {
"metrics": {
"terms": {
"field": "event",
"size": 12
}
}
}
}
}
}'

On Friday, 13 June 2014 10:09:46 UTC+3, Thomas wrote:

Hi,

I'm facing a performance issue with some aggregations I perform, and I
need your help if possible:

I have to documents, the request and the event. The request is the
parent of the event. Below is a (sample) mapping

"event" : {
"dynamic" : "strict",
"_parent" : {
"type" : "request"
},
"properties" : {
"event_time" : {
"format" : "dateOptionalTime",
"type" : "date"
},
"count" : {
"type" : "integer"
},
"event" : {
"index" : "not_analyzed",
"type" : "string"
}
}
}

"request" : {
"dynamic" : "strict",
"_id" : {
"path" : "uniqueId"
},
"properties" : {
"uniqueId" : {
"index" : "not_analyzed",
"type" : "string"
},
"user" : {
"index" : "not_analyzed",
"type" : "string"
},
"code" : {
"type" : "integer"
},
"country" : {
"index" : "not_analyzed",
"type" : "string"
},
"city" : {
"index" : "not_analyzed",
"type" : "string"
}
....
}
}

My cluster is becoming really big (almost 2 TB of data with billions of
documents) and i maintain one index per day, whereas I occasionally delete
old indices. My daily index is about 20GB big. The version of elasticsearch
that I use is 1.1.1.

My problems start when I want to get some aggregations of events with some
criteria which is applied in the parent request document. For example count
be the events of type click for country = US and code=12. What I was
initially doing was to generate a scriptFilter for the request document (in
Groovy) and I was adding multiple aggregations in one search request. This
ended up being very slow so I removed the scripting logic and I supported
my logic with java code.

What seems to be initially solved in my local machine, when I got back to
the cluster, nothing has changed. Again my app performs really really poor.
I get more than 10 seconds to perform a search with ~10 sub-aggregations.

What seems strange is that I notice that the cluster is pretty ok with
regards load average, CPU etc.

Any hints on where to look for solving this out? to be able to identify
the bottleneck

Ask for any additional information to provide, I didn't want to make
this post too long to read
Thank you

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2d8da0eb-0700-4bea-9acb-b8052db77e05%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jpountz · June 13, 2014, 9:29am

Indeed that is what I meant.

On Fri, Jun 13, 2014 at 10:33 AM, Thomas thomas.bolis@gmail.com wrote:

So I restructured my curl as follows, is this what you mean?, by doing
some first hits i do get some slight improvement, but need to check into
production data:

Thank you will try it and come back with results

curl -XPOST "
http://10.129.2.42:9200/logs-idx.20140613/event/_search?search_type=count"
-d'
{
"query": {
"filtered": {
  "filter": {
    "or": [
      {
        "and": [
          {
            "has_parent": {
              "type": "request",
              "filter": {
                "and": {
                  "filters": [
                    {
                      "term": {
                        "country": "US"
                      }
                    },
                    {
                      "term": {
                        "city": "NY"
                      }
                    },
                    {
                      "term": {
                        "code": 12
                      }
                    }
                  ]
                }
              }
            }
          },
          {
            "range": {
              "event_time": {
                "gte": "2014-06-13T10:00:00",
                "lt": "2014-06-13T11:00:00"
              }
            }
          }
        ]
      },
      {
        "and": [
          {
            "has_parent": {
              "type": "request",
              "filter": {
                "and": {
                  "filters": [
                    {
                      "term": {
                        "country": "US"
                      }
                    },
                    {
                      "term": {
                        "city": "NY"
                      }
                    },
                    {
                      "term": {
                        "code": 12
                      }
                    },
                    {
                      "range": {
                        "request_time": {
                          "gte": "2014-06-13T10:00:00",
                          "lt": "2014-06-13T11:00:00"
                        }
                      }
                    }
                  ]
                }
              }
            }
          },
          {
            "range": {
              "event_time": {
                "lt": "2014-06-13T10:00:00"
              }
            }
          }
        ]
      }
    ]
  }
}
},
"aggs": {
"per_interval": {
"date_histogram": {
"field": "event_time",
"interval": "minute"
},
"aggs": {
"metrics": {
"terms": {
"field": "event",
"size": 12
}
}
}
}
}
}'

On Friday, 13 June 2014 10:09:46 UTC+3, Thomas wrote:

Hi,

I'm facing a performance issue with some aggregations I perform, and I
need your help if possible:

I have to documents, the request and the event. The request is the
parent of the event. Below is a (sample) mapping

"event" : {
"dynamic" : "strict",
"_parent" : {
"type" : "request"
},
"properties" : {
"event_time" : {
"format" : "dateOptionalTime",
"type" : "date"
},
"count" : {
"type" : "integer"
},
"event" : {
"index" : "not_analyzed",
"type" : "string"
}
}
}

"request" : {
"dynamic" : "strict",
"_id" : {
"path" : "uniqueId"
},
"properties" : {
"uniqueId" : {
"index" : "not_analyzed",
"type" : "string"
},
"user" : {
"index" : "not_analyzed",
"type" : "string"
},
"code" : {
"type" : "integer"
},
"country" : {
"index" : "not_analyzed",
"type" : "string"
},
"city" : {
"index" : "not_analyzed",
"type" : "string"
}
....
}
}

My cluster is becoming really big (almost 2 TB of data with billions of
documents) and i maintain one index per day, whereas I occasionally delete
old indices. My daily index is about 20GB big. The version of elasticsearch
that I use is 1.1.1.

My problems start when I want to get some aggregations of events with
some criteria which is applied in the parent request document. For example
count be the events of type click for country = US and code=12. What I
was initially doing was to generate a scriptFilter for the request document
(in Groovy) and I was adding multiple aggregations in one search request.
This ended up being very slow so I removed the scripting logic and I
supported my logic with java code.

What seems to be initially solved in my local machine, when I got back to
the cluster, nothing has changed. Again my app performs really really poor.
I get more than 10 seconds to perform a search with ~10 sub-aggregations.

What seems strange is that I notice that the cluster is pretty ok with
regards load average, CPU etc.

Any hints on where to look for solving this out? to be able to identify
the bottleneck

Ask for any additional information to provide, I didn't want to make
this post too long to read
Thank you

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/2d8da0eb-0700-4bea-9acb-b8052db77e05%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/2d8da0eb-0700-4bea-9acb-b8052db77e05%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j7hBkKROnL%3D7mTFua6jPtnXTHY%3D0iqRkBdCmNePd1OaZA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Thomas_Bolis · June 13, 2014, 12:25pm

Hi,

Did try the over-mentioned solution but it seems that it does not make that
difference, although it increased slightly the performance. What I perform
is a multiSearch where I perform more than one (10-20) of the mentioned
query, for different combinations of terms. I have roughly 15 thousand
combinations. I do it against an index with about 90 Million documents in
total, pretty heavy. This operation in order to complete it takes 5-6
minutes each time.

I attach a screenshot from bigdesk in case it is helpful, you can see that
the CPU utilization is big

Is there any optimization I can perform in an index level, maybe shutdown
some features I do not use etc.

Thomas.

On Friday, 13 June 2014 10:09:46 UTC+3, Thomas wrote:

Hi,

I'm facing a performance issue with some aggregations I perform, and I
need your help if possible:

I have to documents, the request and the event. The request is the
parent of the event. Below is a (sample) mapping

"event" : {
"dynamic" : "strict",
"_parent" : {
"type" : "request"
},
"properties" : {
"event_time" : {
"format" : "dateOptionalTime",
"type" : "date"
},
"count" : {
"type" : "integer"
},
"event" : {
"index" : "not_analyzed",
"type" : "string"
}
}
}

"request" : {
"dynamic" : "strict",
"_id" : {
"path" : "uniqueId"
},
"properties" : {
"uniqueId" : {
"index" : "not_analyzed",
"type" : "string"
},
"user" : {
"index" : "not_analyzed",
"type" : "string"
},
"code" : {
"type" : "integer"
},
"country" : {
"index" : "not_analyzed",
"type" : "string"
},
"city" : {
"index" : "not_analyzed",
"type" : "string"
}
....
}
}

My cluster is becoming really big (almost 2 TB of data with billions of
documents) and i maintain one index per day, whereas I occasionally delete
old indices. My daily index is about 20GB big. The version of elasticsearch
that I use is 1.1.1.

My problems start when I want to get some aggregations of events with some
criteria which is applied in the parent request document. For example count
be the events of type click for country = US and code=12. What I was
initially doing was to generate a scriptFilter for the request document (in
Groovy) and I was adding multiple aggregations in one search request. This
ended up being very slow so I removed the scripting logic and I supported
my logic with java code.

What seems to be initially solved in my local machine, when I got back to
the cluster, nothing has changed. Again my app performs really really poor.
I get more than 10 seconds to perform a search with ~10 sub-aggregations.

What seems strange is that I notice that the cluster is pretty ok with
regards load average, CPU etc.

Any hints on where to look for solving this out? to be able to identify
the bottleneck

Ask for any additional information to provide, I didn't want to make
this post too long to read
Thank you

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e7bac5cf-90a9-4593-8efa-35c733ace659%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.