How to improve search performance

(Kenneth Hubacher) #1

I have 200m documents where half are marked (with a field) src = 1 and the other src = 2.

When searching, on multiple fields (usually 6-8), the search time differs dramatically if the src is included.
I realize that all the other fields query text are relatively rare, versus the src which is 100m no matter which one is picked. If I leave src out of the query, search time is 2.5x faster. (650ms down to about 260ms)
I've tried setting it up as a must, as a filtered query and even in the should query but in all cases the search time is impacted similarly.

Is there a way to enforce the src in some way that doesn't incur such a high penalty?


(Abdon Pijpelink) #2

One thing you could do is split your documents into two separate indexes, based on the value of src, ie. put all documents with src=1 in in one index and all documents with src=2 in another index. Then, at query time you can query one index to get docs with src=1, the other index to get all docs with src=2, or both indexes to get all documents.

If that does not work for you then a question would be: how has the src field been mapped? If you mapped src as a numeric datatype (integer or long), then that datatype is not optimized for these kind of term queries. Consider mapping the src field as a keyword field instead.

(Kenneth Hubacher) #3

Thanks Abdon. One consideration is to split it into 2 indexes but was hoping to avoid it. And yes, the field is a keyword field.
Since my post I've been experimenting with post filters on the source field and that actually looks like it might do the trick. If I test without source as a field (this is just a test to compare the performance where source is not included) versus as a post filter, the performance is about the same so it appears a post filter doesn't impact the performance.

(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.