With this _cache rule, results are slower: 2 to 3 seconds!!
I have no idea how to debug this,
here is a quick gist: https://gist.github.com/damienalexandre/6581850
But without massive datas the difference between cached and not cached is
not as clear as what I get.
I can see two issues here:
my range query are slow, I guess this is the cost of doing a date range
accross billions docs ;
my nested filter is not cached, trying to set the cache make the query
slower.
I'm looking for advice and tips on how to debug this,
maybe it's a bug, but before creating an issue on github I think another
pair of eyes can't hurt.
PS : I have also tried to set the filter to an alias - same perf issue.
This is perhaps naive of me, but what I've seen work well across 100M
documents (about 1/20 the number of documents you mentioned), the best
range performance is when the range query is wrapped inside a bool query.
For example (with the actual gn and sn query values changed to protect the
innocent):
This query took 3.4 seconds to return 5 documents out of 34 when the
numeric range was omitted. But it did get much faster on subsequent
queries, down to 100ms or less.
I hope this helps!
P.S. My client actually builds the queries in Java, and then can emit them
as JSON for debugging and explanatory reasons.
Brian
On Monday, September 16, 2013 11:09:48 AM UTC-4, Damien Alexandre wrote:
Hi everyone,
ES 0.90.3, 5 shards.
I run an index with a nested field,
I have like 6 billions documents, and I run a query like this:
I would change the range query into range filter, then each range filter be
cached on its own by default:
The range query doesn't cache at all on its own. If you wrap a filtered
query as inner query in the nested filter and put the range filters in the
filter part and the fields query in the query part then I expect a faster
execution time:
The first time the range filters are executed these execution time is
similar than the range query, but any subsequent search request should be
much faster.
Also I see that you're filtering on a day precession, are you also indexing
the dates into the same precession? If not then I expect the range filter
(and query) to execute better if you do this.
Also caching the nested filter doesn't really help, if one element in the
nested filter changes than the cached entry can't be reused, and the nested
filter needs to be completely re-executed.
This is perhaps naive of me, but what I've seen work well across 100M
documents (about 1/20 the number of documents you mentioned), the best
range performance is when the range query is wrapped inside a bool query.
For example (with the actual gn and sn query values changed to protect the
innocent):
This query took 3.4 seconds to return 5 documents out of 34 when the
numeric range was omitted. But it did get much faster on subsequent
queries, down to 100ms or less.
I hope this helps!
P.S. My client actually builds the queries in Java, and then can emit them
as JSON for debugging and explanatory reasons.
Brian
On Monday, September 16, 2013 11:09:48 AM UTC-4, Damien Alexandre wrote:
Hi everyone,
ES 0.90.3, 5 shards.
I run an index with a nested field,
I have like 6 billions documents, and I run a query like this:
using Range filter instead of Range query works very well! I dropped from
300ms to 10ms on a lot of my queries!
Still, I think it's strange that the Nested Filter cache does not work
better than the Range filter one's - looks odd to me, but anyway :]
About the date, yes they are indexed with a day precision, like in my
queries - so it's kind of fast now,
I apply sort, filters, queries... on billions of documents with nested
filed and now I get my results in 10ms: that's awesome
The range query doesn't cache at all on its own. If you wrap a filtered
query as inner query in the nested filter and put the range filters in the
filter part and the fields query in the query part then I expect a faster
execution time:
The first time the range filters are executed these execution time is
similar than the range query, but any subsequent search request should be
much faster.
Also I see that you're filtering on a day precession, are you also
indexing the dates into the same precession? If not then I expect the range
filter (and query) to execute better if you do this.
Also caching the nested filter doesn't really help, if one element in the
nested filter changes than the cached entry can't be reused, and the nested
filter needs to be completely re-executed.
This is perhaps naive of me, but what I've seen work well across 100M
documents (about 1/20 the number of documents you mentioned), the best
range performance is when the range query is wrapped inside a bool query.
For example (with the actual gn and sn query values changed to protect the
innocent):
This query took 3.4 seconds to return 5 documents out of 34 when the
numeric range was omitted. But it did get much faster on subsequent
queries, down to 100ms or less.
I hope this helps!
P.S. My client actually builds the queries in Java, and then can emit
them as JSON for debugging and explanatory reasons.
Brian
On Monday, September 16, 2013 11:09:48 AM UTC-4, Damien Alexandre wrote:
Hi everyone,
ES 0.90.3, 5 shards.
I run an index with a nested field,
I have like 6 billions documents, and I run a query like this:
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.
The range query doesn't cache at all on its own. If you wrap a filtered
query as inner query in the nested filter and put the range filters in the
filter part and the fields query in the query part then I expect a faster
execution time:
The first time the range filters are executed these execution time is
similar than the range query, but any subsequent search request should be
much faster.
Also I see that you're filtering on a day precession, are you also
indexing the dates into the same precession? If not then I expect the range
filter (and query) to execute better if you do this.
Also caching the nested filter doesn't really help, if one element in the
nested filter changes than the cached entry can't be reused, and the nested
filter needs to be completely re-executed.
This is perhaps naive of me, but what I've seen work well across 100M
documents (about 1/20 the number of documents you mentioned), the best
range performance is when the range query is wrapped inside a bool query.
For example (with the actual gn and sn query values changed to protect the
innocent):
This query took 3.4 seconds to return 5 documents out of 34 when the
numeric range was omitted. But it did get much faster on subsequent
queries, down to 100ms or less.
I hope this helps!
P.S. My client actually builds the queries in Java, and then can emit
them as JSON for debugging and explanatory reasons.
Brian
On Monday, September 16, 2013 11:09:48 AM UTC-4, Damien Alexandre wrote:
Hi everyone,
ES 0.90.3, 5 shards.
I run an index with a nested field,
I have like 6 billions documents, and I run a query like this:
The range filter should work best inside a bool filter. The numeric_range should work best inside an and/or filter, but only when it
isn't cached (by default this filter is never cached).
On 17 September 2013 23:52, Ivan Brusic ivan@brusic.com wrote:
Don't range filter work better with and/or/not filter and not inside bool
filters due to bitset caching? Never profiled myself.
The range query doesn't cache at all on its own. If you wrap a filtered
query as inner query in the nested filter and put the range filters in the
filter part and the fields query in the query part then I expect a faster
execution time:
The first time the range filters are executed these execution time is
similar than the range query, but any subsequent search request should be
much faster.
Also I see that you're filtering on a day precession, are you also
indexing the dates into the same precession? If not then I expect the range
filter (and query) to execute better if you do this.
Also caching the nested filter doesn't really help, if one element in the
nested filter changes than the cached entry can't be reused, and the nested
filter needs to be completely re-executed.
This is perhaps naive of me, but what I've seen work well across 100M
documents (about 1/20 the number of documents you mentioned), the best
range performance is when the range query is wrapped inside a bool query.
For example (with the actual gn and sn query values changed to protect the
innocent):
This query took 3.4 seconds to return 5 documents out of 34 when the
numeric range was omitted. But it did get much faster on subsequent
queries, down to 100ms or less.
I hope this helps!
P.S. My client actually builds the queries in Java, and then can emit
them as JSON for debugging and explanatory reasons.
Brian
On Monday, September 16, 2013 11:09:48 AM UTC-4, Damien Alexandre wrote:
Hi everyone,
ES 0.90.3, 5 shards.
I run an index with a nested field,
I have like 6 billions documents, and I run a query like this:
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.