Facets performance issue


(Hua) #1

Hi

Here is my facets:
{"facets":{"DF":{"facet_filter":{"and":[{"query":{"text":{"@type":"analytics_stg"}}},{"range":{"timestamp":{"to":"1347495877557","include_lower":true,"include_upper":true,"from":"1347452677556"}}},{"not":{"query":{"term":{"status":"200"}}}},{"not":{"query":{"term":{"status":"304"}}}}]},"date_histogram":{"field":"timestamp","interval":"hour"}}}}

I tried it with my ES cluster(2 machines) having 60 to 80 gb data and it
costs 5+ mins for the first time. Then the search is quite quick, I guess
there is cache on ES server side. But when I re-try the facet an hour
later, it still costs many minutes to produce the result.

How can I speed up facets?

--


(Clinton Gormley) #2

On Thu, 2012-09-13 at 02:16 -0700, Hua wrote:

Hi

Here is my facets:
{"facets":{"DF":{"facet_filter":{"and":[{"query":{"text":{"@type":"analytics_stg"}}},{"range":{"timestamp":{"to":"1347495877557","include_lower":true,"include_upper":true,"from":"1347452677556"}}},{"not":{"query":{"term":{"status":"200"}}}},{"not":{"query":{"term":{"status":"304"}}}}]},"date_histogram":{"field":"timestamp","interval":"hour"}}}}

I tried it with my ES cluster(2 machines) having 60 to 80 gb data and
it costs 5+ mins for the first time. Then the search is quite quick, I
guess there is cache on ES server side. But when I re-try the facet an
hour later, it still costs many minutes to produce the result.

You are correct: the data has to be loaded and cached by ES. Presumably
you continue indexing new data, which is why the facet after an hour is
slow again (there is new data to load).

In v 0.20 (not yet released) there is a warmup API which will warm up
each new segment before it becomes visible to search.

In the meantime, you can just run a typical query in the background to
force the loading and caching of the new data.

clint

How can I speed up facets?

--

--


(Hua) #3

Thanks for your help.

Is there a way to turn off the cache used for facets? I'm doing a tunning
on facets filter to see whether it helps to speed up query, but the cache
makes it hard to meature query time.

I'm using elasticsearch 0.19.9

--


(Hua) #4

Forget to mention, if I add more machines to the cluster, will it improve
the performance?

--


(system) #5