Elasticsearch Aggregation time

hi ,

we are trying to run some aggregation over around 5 million documents with
cardinality of the fields of the order of 1000 , the aggregation is a
filter aggregation which wraps underlying term aggregation . Right now
it's taking around 1.2 secs on an average to compute it , the time
increases when no. of documents are increased or I try to do multiple
aggregations. we have aws extra large machines, shards 3 and replication 2
.

1.) can we improve this time (will like it to get it within 1 sec) , I can
see very little if any of field cache being used
2.) how does this scale , it increases with number of documents , how can I
offset that (increasing nodes , replication , sharding ??)
3.) are there any better options (plugins or a different platform for
aggregating data )

regards

Ankur Goel

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/fb73f5bd-24a4-4065-9253-39aa8dd9dfe0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Can you please show the json of the request that you send to elasticsearch?

On Wed, Nov 5, 2014 at 10:52 AM, Ankur Goel ankrugold@gmail.com wrote:

hi ,

we are trying to run some aggregation over around 5 million documents with
cardinality of the fields of the order of 1000 , the aggregation is a
filter aggregation which wraps underlying term aggregation . Right now
it's taking around 1.2 secs on an average to compute it , the time
increases when no. of documents are increased or I try to do multiple
aggregations. we have aws extra large machines, shards 3 and replication 2
.

1.) can we improve this time (will like it to get it within 1 sec) , I can
see very little if any of field cache being used
2.) how does this scale , it increases with number of documents , how can
I offset that (increasing nodes , replication , sharding ??)
3.) are there any better options (plugins or a different platform for
aggregating data )

regards

Ankur Goel

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/fb73f5bd-24a4-4065-9253-39aa8dd9dfe0%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/fb73f5bd-24a4-4065-9253-39aa8dd9dfe0%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j756jhQKwxT2pzuEJcN8HuGF0CrX88d9hOReOC%2BRDF8Dg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

query" : {

"filtered" : {

  "query" : {

    "match_all" : { }

  },

  "filter" : {

    "bool" : {

      "must" : {

        "bool" : {

          "must" : {

            "terms" : {

              "isActive" : [ "true" ]

            }

          }

        }

      }

    }

  }

}

},

"aggregations" : {

"revenue : {

  "filter" : {

    "match_all" : { }

  },

  "aggregations" : {

    "revenueUSD" : {

      "range" : {

        "field" : "revenueUSD",

        "ranges" : [ {

          "to" : 1.0

        }, {

          "from" : 1.0,

          "to" : 5.0

        }, {

          "from" : 5.0,

          "to" : 50.0

        }, {

          "from" : 50.0,

          "to" : 100.0

        }, {

          "from" : 100.0,

          "to" : 1000.0

        }, {

          "from" : 1000.0

        } ]

      }

    }

  }

}

}

}
this is a sample , the match all is usually replaced by some query

On Wednesday, 5 November 2014 19:38:42 UTC+5:30, Adrien Grand wrote:

Can you please show the json of the request that you send to elasticsearch?

On Wed, Nov 5, 2014 at 10:52 AM, Ankur Goel <ankr...@gmail.com
<javascript:>> wrote:

hi ,

we are trying to run some aggregation over around 5 million documents
with cardinality of the fields of the order of 1000 , the aggregation is a
filter aggregation which wraps underlying term aggregation . Right now
it's taking around 1.2 secs on an average to compute it , the time
increases when no. of documents are increased or I try to do multiple
aggregations. we have aws extra large machines, shards 3 and replication 2
.

1.) can we improve this time (will like it to get it within 1 sec) , I
can see very little if any of field cache being used
2.) how does this scale , it increases with number of documents , how can
I offset that (increasing nodes , replication , sharding ??)
3.) are there any better options (plugins or a different platform for
aggregating data )

regards

Ankur Goel

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/fb73f5bd-24a4-4065-9253-39aa8dd9dfe0%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/fb73f5bd-24a4-4065-9253-39aa8dd9dfe0%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/deb3e7e4-751a-4d7e-92d5-28be42b11e76%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

query" : {

"filtered" : {

  "query" : {

    "match_all" : { }

  },

  "filter" : {

    "bool" : {

      "must" : {

        "bool" : {

          "must" : {

            "terms" : {

              "isActive" : [ "true" ]

            }

          }

        }

      }

    }

  }

}

},

"aggregations" : {

"revenueFilter" : {

  "filter" : {

    "match_all" : { }

  },

  "aggregations" : {

    "revenue" : {

      "range" : {

        "field" : "revenue",

        "ranges" : [ {

          "to" : 1.0

        }, {

          "from" : 1.0,

          "to" : 5.0

        }, {

          "from" : 5.0,

          "to" : 50.0

        }, {

          "from" : 50.0,

          "to" : 100.0

        }, {

          "from" : 100.0,

          "to" : 1000.0

        }, {

          "from" : 1000.0

        } ]

      }

    }

  }

}

}

}

On Wednesday, 5 November 2014 19:38:42 UTC+5:30, Adrien Grand wrote:

Can you please show the json of the request that you send to elasticsearch?

On Wed, Nov 5, 2014 at 10:52 AM, Ankur Goel <ankr...@gmail.com
<javascript:>> wrote:

hi ,

we are trying to run some aggregation over around 5 million documents
with cardinality of the fields of the order of 1000 , the aggregation is a
filter aggregation which wraps underlying term aggregation . Right now
it's taking around 1.2 secs on an average to compute it , the time
increases when no. of documents are increased or I try to do multiple
aggregations. we have aws extra large machines, shards 3 and replication 2
.

1.) can we improve this time (will like it to get it within 1 sec) , I
can see very little if any of field cache being used
2.) how does this scale , it increases with number of documents , how can
I offset that (increasing nodes , replication , sharding ??)
3.) are there any better options (plugins or a different platform for
aggregating data )

regards

Ankur Goel

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/fb73f5bd-24a4-4065-9253-39aa8dd9dfe0%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/fb73f5bd-24a4-4065-9253-39aa8dd9dfe0%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c83b1ddc-6a4b-4f24-ba3d-f48a8cb108c2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hi Ankur,

I assume that your revenueFilter aggregation uses an actual filter and not
a match_all filter? Otherwise you could just remove it.

Are you actually interested in the top hits that match your query? If not,
you could switch to the count search type and move the filter from your
aggregation to the filtered_query, this would be faster.

On Mon, Nov 10, 2014 at 11:53 AM, Ankur Goel ankrugold@gmail.com wrote:

query" : {

"filtered" : {

  "query" : {

    "match_all" : { }

  },

  "filter" : {

    "bool" : {

      "must" : {

        "bool" : {

          "must" : {

            "terms" : {

              "isActive" : [ "true" ]

            }

          }

        }

      }

    }

  }

}

},

"aggregations" : {

"revenueFilter" : {

  "filter" : {

    "match_all" : { }

  },

  "aggregations" : {

    "revenue" : {

      "range" : {

        "field" : "revenue",

        "ranges" : [ {

          "to" : 1.0

        }, {

          "from" : 1.0,

          "to" : 5.0

        }, {

          "from" : 5.0,

          "to" : 50.0

        }, {

          "from" : 50.0,

          "to" : 100.0

        }, {

          "from" : 100.0,

          "to" : 1000.0

        }, {

          "from" : 1000.0

        } ]

      }

    }

  }

}

}

}

On Wednesday, 5 November 2014 19:38:42 UTC+5:30, Adrien Grand wrote:

Can you please show the json of the request that you send to
elasticsearch?

On Wed, Nov 5, 2014 at 10:52 AM, Ankur Goel ankr...@gmail.com wrote:

hi ,

we are trying to run some aggregation over around 5 million documents
with cardinality of the fields of the order of 1000 , the aggregation is a
filter aggregation which wraps underlying term aggregation . Right now
it's taking around 1.2 secs on an average to compute it , the time
increases when no. of documents are increased or I try to do multiple
aggregations. we have aws extra large machines, shards 3 and replication 2
.

1.) can we improve this time (will like it to get it within 1 sec) , I
can see very little if any of field cache being used
2.) how does this scale , it increases with number of documents , how
can I offset that (increasing nodes , replication , sharding ??)
3.) are there any better options (plugins or a different platform for
aggregating data )

regards

Ankur Goel

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/fb73f5bd-24a4-4065-9253-39aa8dd9dfe0%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/fb73f5bd-24a4-4065-9253-39aa8dd9dfe0%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/c83b1ddc-6a4b-4f24-ba3d-f48a8cb108c2%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/c83b1ddc-6a4b-4f24-ba3d-f48a8cb108c2%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j61%2BLxCFrwppxWjLNa8u0p7QLTkUeLS6S2CBTLywbAhiQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Hi Adrian,
thanks,

we are already using count type , the filter will be an actual filter ,
we want different filters on each aggregation so it would not be possible
to do a filtered query.

Can we improve using more replications or more sharding .

On Wednesday, 12 November 2014 04:16:54 UTC+5:30, Adrien Grand wrote:

Hi Ankur,

I assume that your revenueFilter aggregation uses an actual filter and not
a match_all filter? Otherwise you could just remove it.

Are you actually interested in the top hits that match your query? If not,
you could switch to the count search type and move the filter from your
aggregation to the filtered_query, this would be faster.

On Mon, Nov 10, 2014 at 11:53 AM, Ankur Goel <ankr...@gmail.com
<javascript:>> wrote:

query" : {

"filtered" : {

  "query" : {

    "match_all" : { }

  },

  "filter" : {

    "bool" : {

      "must" : {

        "bool" : {

          "must" : {

            "terms" : {

              "isActive" : [ "true" ]

            }

          }

        }

      }

    }

  }

}

},

"aggregations" : {

"revenueFilter" : {

  "filter" : {

    "match_all" : { }

  },

  "aggregations" : {

    "revenue" : {

      "range" : {

        "field" : "revenue",

        "ranges" : [ {

          "to" : 1.0

        }, {

          "from" : 1.0,

          "to" : 5.0

        }, {

          "from" : 5.0,

          "to" : 50.0

        }, {

          "from" : 50.0,

          "to" : 100.0

        }, {

          "from" : 100.0,

          "to" : 1000.0

        }, {

          "from" : 1000.0

        } ]

      }

    }

  }

}

}

}

On Wednesday, 5 November 2014 19:38:42 UTC+5:30, Adrien Grand wrote:

Can you please show the json of the request that you send to
elasticsearch?

On Wed, Nov 5, 2014 at 10:52 AM, Ankur Goel ankr...@gmail.com wrote:

hi ,

we are trying to run some aggregation over around 5 million documents
with cardinality of the fields of the order of 1000 , the aggregation is a
filter aggregation which wraps underlying term aggregation . Right now
it's taking around 1.2 secs on an average to compute it , the time
increases when no. of documents are increased or I try to do multiple
aggregations. we have aws extra large machines, shards 3 and replication 2
.

1.) can we improve this time (will like it to get it within 1 sec) , I
can see very little if any of field cache being used
2.) how does this scale , it increases with number of documents , how
can I offset that (increasing nodes , replication , sharding ??)
3.) are there any better options (plugins or a different platform for
aggregating data )

regards

Ankur Goel

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/fb73f5bd-24a4-4065-9253-39aa8dd9dfe0%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/fb73f5bd-24a4-4065-9253-39aa8dd9dfe0%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/c83b1ddc-6a4b-4f24-ba3d-f48a8cb108c2%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/c83b1ddc-6a4b-4f24-ba3d-f48a8cb108c2%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e26c7ab9-2923-4e93-bbf6-a74530f3df1d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.