Search cache question


(Shantanu Sen) #1

I have an index of 50 million docs with 2 shards. I am running a match-all
query as the initial top-level query that has 8 Terms Aggregation filters,
one of which has a high cardinality value - 10k. The rest of the
aggregations are all < 10 cardinality. Then a drill-down query is run with
a post filter.

The top-level query has a hit count of 50 million and the drill down query
has a hit count of 20 million.

Both the shards are on a single node. I am using a Transport node and java
apis to run the searches.

When I run the java client from the data node, and the system is cold I am
getting a 3 minute latency of the top level query while the drill down has
a latency of 800 ms

When the system is warm, I am getting a latency of 25 secs of the top-level
query, while the drill down remain the same at around 800 ms.

When the system is cold, if I run the same client from a remote system, the
latency of the top-level query is 21 secs, while the subsequent queries
drop down to 8 secs.

I am running the queries with the aggregation filter size set to 0 since we
need the exact count.

I understand that the high cardinality filter is slowing the queries and
the spike in the CPU is for calculating the count.

I would like to understand why the search latency is markedly less when
running the search client from a remote system - the "cold" latency value
is 21 secs using a remote client vs 3 mins on the client running on the
data node. This is when nothing else is running on the data node.

Also, I would like to understand if we can tune the duration of the cache.
I see the same latency when I re-run the query after the system is idle for
some time - hence the cache must be getting cleared after a set time-period.

Finally, is there any other type of aggregation filter (other than the
Terms Aggregation) that is recommended for high cardinality aggregation
items so as to bring down the latency?

Thanks for any pointers.
Shantanu

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f286dbeb-c398-48dd-8c0c-a1cb2a3f884e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #2