In my continuing quest to make my search fast enough I've run into another
roadblock: phrase queries. On most user queries I generate a regular
boolean query for their terms but I also generate a rescore that checks if
their query matches as a phrase query with slop 1. That means that every
query is also a phrase query. I'm found that varying the size of the
rescore varies performance considerably:
1024 will push one or two of my servers over the edge and they'll start io
256 is actually OK if the caches are hot but if they aren't can push me
into io thrash.
64 seems perfectly ok. Comfortable even.
Obviously if I throw more hardware at the problem it'll get better - more
replicas and shards and better disks will help. So will more ram. Ram
makes everything better.....
Anyway - say my hardware cycle takes a few months and I need a fix faster -
is there something I can do? I'm reasonably sure I can do something with
a shingle filter but I'm not sure exactly what that something is in the
case of queries with a slop. Has anyone had cause like this before?
One thing on my side is that I don't really need phrase queries. I can
play around with the specification a bit so long as I stay sane. I just
need to make documents that contain the terms near each other float to the
top. It'd be better if it was the exact phrases but some false positives
is probably ok. The phrase query got the job done but if there is a way to
cheat it I'm happy to try.
Thanks for reading!
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to email@example.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd1ae3Z3h7j2sK6Q26-0uQFq_wcSj1fhXap0aZ9MN3R5mQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.