Elasticsearch Aggregation time

Ankur_Goel · November 5, 2014, 9:52am

hi ,

we are trying to run some aggregation over around 5 million documents with
cardinality of the fields of the order of 1000 , the aggregation is a
filter aggregation which wraps underlying term aggregation . Right now
it's taking around 1.2 secs on an average to compute it , the time
increases when no. of documents are increased or I try to do multiple
aggregations. we have aws extra large machines, shards 3 and replication 2
.

1.) can we improve this time (will like it to get it within 1 sec) , I can
see very little if any of field cache being used
2.) how does this scale , it increases with number of documents , how can I
offset that (increasing nodes , replication , sharding ??)
3.) are there any better options (plugins or a different platform for
aggregating data )

regards

Ankur Goel

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/fb73f5bd-24a4-4065-9253-39aa8dd9dfe0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jpountz · November 5, 2014, 2:08pm

Can you please show the json of the request that you send to elasticsearch?

On Wed, Nov 5, 2014 at 10:52 AM, Ankur Goel ankrugold@gmail.com wrote:

hi ,

we are trying to run some aggregation over around 5 million documents with
cardinality of the fields of the order of 1000 , the aggregation is a
filter aggregation which wraps underlying term aggregation . Right now
it's taking around 1.2 secs on an average to compute it , the time
increases when no. of documents are increased or I try to do multiple
aggregations. we have aws extra large machines, shards 3 and replication 2
.

1.) can we improve this time (will like it to get it within 1 sec) , I can
see very little if any of field cache being used
2.) how does this scale , it increases with number of documents , how can
I offset that (increasing nodes , replication , sharding ??)
3.) are there any better options (plugins or a different platform for
aggregating data )

regards

Ankur Goel

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/fb73f5bd-24a4-4065-9253-39aa8dd9dfe0%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/fb73f5bd-24a4-4065-9253-39aa8dd9dfe0%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j756jhQKwxT2pzuEJcN8HuGF0CrX88d9hOReOC%2BRDF8Dg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Ankur_Goel · November 10, 2014, 10:52am

query" : {

"filtered" : {

  "query" : {

    "match_all" : { }

  },

  "filter" : {

    "bool" : {

      "must" : {

        "bool" : {

          "must" : {

            "terms" : {

              "isActive" : [ "true" ]

            }

          }

        }

      }

    }

  }

}

},

"aggregations" : {

"revenue : {

  "filter" : {

    "match_all" : { }

  },

  "aggregations" : {

    "revenueUSD" : {

      "range" : {

        "field" : "revenueUSD",

        "ranges" : [ {

          "to" : 1.0

        }, {

          "from" : 1.0,

          "to" : 5.0

        }, {

          "from" : 5.0,

          "to" : 50.0

        }, {

          "from" : 50.0,

          "to" : 100.0

        }, {

          "from" : 100.0,

          "to" : 1000.0

        }, {

          "from" : 1000.0

        } ]

      }

    }

  }

}

}

}
this is a sample , the match all is usually replaced by some query

On Wednesday, 5 November 2014 19:38:42 UTC+5:30, Adrien Grand wrote:

Can you please show the json of the request that you send to elasticsearch?

On Wed, Nov 5, 2014 at 10:52 AM, Ankur Goel <ankr...@gmail.com
<javascript:>> wrote:

hi ,

we are trying to run some aggregation over around 5 million documents
with cardinality of the fields of the order of 1000 , the aggregation is a
filter aggregation which wraps underlying term aggregation . Right now
it's taking around 1.2 secs on an average to compute it , the time
increases when no. of documents are increased or I try to do multiple
aggregations. we have aws extra large machines, shards 3 and replication 2
.

1.) can we improve this time (will like it to get it within 1 sec) , I
can see very little if any of field cache being used
2.) how does this scale , it increases with number of documents , how can
I offset that (increasing nodes , replication , sharding ??)
3.) are there any better options (plugins or a different platform for
aggregating data )

regards

Ankur Goel

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/fb73f5bd-24a4-4065-9253-39aa8dd9dfe0%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/fb73f5bd-24a4-4065-9253-39aa8dd9dfe0%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/deb3e7e4-751a-4d7e-92d5-28be42b11e76%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ankur_Goel · November 10, 2014, 10:53am

query" : {

"filtered" : {

  "query" : {

    "match_all" : { }

  },

  "filter" : {

    "bool" : {

      "must" : {

        "bool" : {

          "must" : {

            "terms" : {

              "isActive" : [ "true" ]

            }

          }

        }

      }

    }

  }

}

},

"aggregations" : {

"revenueFilter" : {

  "filter" : {

    "match_all" : { }

  },

  "aggregations" : {

    "revenue" : {

      "range" : {

        "field" : "revenue",

        "ranges" : [ {

          "to" : 1.0

        }, {

          "from" : 1.0,

          "to" : 5.0

        }, {

          "from" : 5.0,

          "to" : 50.0

        }, {

          "from" : 50.0,

          "to" : 100.0

        }, {

          "from" : 100.0,

          "to" : 1000.0

        }, {

          "from" : 1000.0

        } ]

      }

    }

  }

}

}

On Wednesday, 5 November 2014 19:38:42 UTC+5:30, Adrien Grand wrote:

Can you please show the json of the request that you send to elasticsearch?

On Wed, Nov 5, 2014 at 10:52 AM, Ankur Goel <ankr...@gmail.com
<javascript:>> wrote:

hi ,

we are trying to run some aggregation over around 5 million documents
with cardinality of the fields of the order of 1000 , the aggregation is a
filter aggregation which wraps underlying term aggregation . Right now
it's taking around 1.2 secs on an average to compute it , the time
increases when no. of documents are increased or I try to do multiple
aggregations. we have aws extra large machines, shards 3 and replication 2
.

1.) can we improve this time (will like it to get it within 1 sec) , I
can see very little if any of field cache being used
2.) how does this scale , it increases with number of documents , how can
I offset that (increasing nodes , replication , sharding ??)
3.) are there any better options (plugins or a different platform for
aggregating data )

regards

Ankur Goel

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/fb73f5bd-24a4-4065-9253-39aa8dd9dfe0%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/fb73f5bd-24a4-4065-9253-39aa8dd9dfe0%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c83b1ddc-6a4b-4f24-ba3d-f48a8cb108c2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jpountz · November 11, 2014, 10:46pm

Hi Ankur,

I assume that your revenueFilter aggregation uses an actual filter and not
a match_all filter? Otherwise you could just remove it.

Are you actually interested in the top hits that match your query? If not,
you could switch to the count search type and move the filter from your
aggregation to the filtered_query, this would be faster.

On Mon, Nov 10, 2014 at 11:53 AM, Ankur Goel ankrugold@gmail.com wrote:

query" : {
"filtered" : {

  "query" : {

    "match_all" : { }

  },

  "filter" : {

    "bool" : {

      "must" : {

        "bool" : {

          "must" : {

            "terms" : {

              "isActive" : [ "true" ]

            }

          }

        }

      }

    }

  }

}
},

"aggregations" : {
"revenueFilter" : {

  "filter" : {

    "match_all" : { }

  },

  "aggregations" : {

    "revenue" : {

      "range" : {

        "field" : "revenue",

        "ranges" : [ {

          "to" : 1.0

        }, {

          "from" : 1.0,

          "to" : 5.0

        }, {

          "from" : 5.0,

          "to" : 50.0

        }, {

          "from" : 50.0,

          "to" : 100.0

        }, {

          "from" : 100.0,

          "to" : 1000.0

        }, {

          "from" : 1000.0

        } ]

      }

    }

  }

}
}

}

On Wednesday, 5 November 2014 19:38:42 UTC+5:30, Adrien Grand wrote:

Can you please show the json of the request that you send to
elasticsearch?

On Wed, Nov 5, 2014 at 10:52 AM, Ankur Goel ankr...@gmail.com wrote:

hi ,

we are trying to run some aggregation over around 5 million documents
with cardinality of the fields of the order of 1000 , the aggregation is a
filter aggregation which wraps underlying term aggregation . Right now
it's taking around 1.2 secs on an average to compute it , the time
increases when no. of documents are increased or I try to do multiple
aggregations. we have aws extra large machines, shards 3 and replication 2
.

1.) can we improve this time (will like it to get it within 1 sec) , I
can see very little if any of field cache being used
2.) how does this scale , it increases with number of documents , how
can I offset that (increasing nodes , replication , sharding ??)
3.) are there any better options (plugins or a different platform for
aggregating data )

regards

Ankur Goel

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/fb73f5bd-24a4-4065-9253-39aa8dd9dfe0%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/fb73f5bd-24a4-4065-9253-39aa8dd9dfe0%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/c83b1ddc-6a4b-4f24-ba3d-f48a8cb108c2%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/c83b1ddc-6a4b-4f24-ba3d-f48a8cb108c2%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j61%2BLxCFrwppxWjLNa8u0p7QLTkUeLS6S2CBTLywbAhiQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Ankur_Goel · November 13, 2014, 6:33am

Hi Adrian,
thanks,

we are already using count type , the filter will be an actual filter ,
we want different filters on each aggregation so it would not be possible
to do a filtered query.

Can we improve using more replications or more sharding .

On Wednesday, 12 November 2014 04:16:54 UTC+5:30, Adrien Grand wrote:

Hi Ankur,

I assume that your revenueFilter aggregation uses an actual filter and not
a match_all filter? Otherwise you could just remove it.

Are you actually interested in the top hits that match your query? If not,
you could switch to the count search type and move the filter from your
aggregation to the filtered_query, this would be faster.

On Mon, Nov 10, 2014 at 11:53 AM, Ankur Goel <ankr...@gmail.com
<javascript:>> wrote:
query" : {
"filtered" : {

  "query" : {

    "match_all" : { }

  },

  "filter" : {

    "bool" : {

      "must" : {

        "bool" : {

          "must" : {

            "terms" : {

              "isActive" : [ "true" ]

            }

          }

        }

      }

    }

  }

}
},

"aggregations" : {
"revenueFilter" : {

  "filter" : {

    "match_all" : { }

  },

  "aggregations" : {

    "revenue" : {

      "range" : {

        "field" : "revenue",

        "ranges" : [ {

          "to" : 1.0

        }, {

          "from" : 1.0,

          "to" : 5.0

        }, {

          "from" : 5.0,

          "to" : 50.0

        }, {

          "from" : 50.0,

          "to" : 100.0

        }, {

          "from" : 100.0,

          "to" : 1000.0

        }, {

          "from" : 1000.0

        } ]

      }

    }

  }

}
}

}

On Wednesday, 5 November 2014 19:38:42 UTC+5:30, Adrien Grand wrote:

Can you please show the json of the request that you send to
elasticsearch?

On Wed, Nov 5, 2014 at 10:52 AM, Ankur Goel ankr...@gmail.com wrote:

hi ,

we are trying to run some aggregation over around 5 million documents
with cardinality of the fields of the order of 1000 , the aggregation is a
filter aggregation which wraps underlying term aggregation . Right now
it's taking around 1.2 secs on an average to compute it , the time
increases when no. of documents are increased or I try to do multiple
aggregations. we have aws extra large machines, shards 3 and replication 2
.

1.) can we improve this time (will like it to get it within 1 sec) , I
can see very little if any of field cache being used
2.) how does this scale , it increases with number of documents , how
can I offset that (increasing nodes , replication , sharding ??)
3.) are there any better options (plugins or a different platform for
aggregating data )

regards

Ankur Goel

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/fb73f5bd-24a4-4065-9253-39aa8dd9dfe0%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/fb73f5bd-24a4-4065-9253-39aa8dd9dfe0%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/c83b1ddc-6a4b-4f24-ba3d-f48a8cb108c2%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/c83b1ddc-6a4b-4f24-ba3d-f48a8cb108c2%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.
--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e26c7ab9-2923-4e93-bbf6-a74530f3df1d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Elasticsearch Aggregations taking a long time Elasticsearch	5	2396	July 5, 2017
Elasticsearch terms aggregation taking 5 seconds on 5 million documents Elasticsearch	7	2015	August 19, 2019
Aggregations taking way too long? Elasticsearch	7	318	May 24, 2022
Bad performance on aggregations Elasticsearch	5	480	July 6, 2017
Slow terms aggregation speed on ~130M documents Elasticsearch	34	7970	May 10, 2019

Elasticsearch Aggregation time

Related topics