We discussed this at length on IRC, but then you timed out.
Try this query: https://gist.github.com/2a127562eb0808408097
Your query needs are quite different from normal full text search, so
you will need to tune your mapping accordingly.
You might get some gain from indexing your 'hashes' as integers rather
than as text. Also, you probably want to tweak things like disabling
norms, disabling term frequency etc, and you may want to use span
queries instead of normal term or text queries.
I'd suggest buying a book like Lucene in Action to understand more about
what is happening under the covers. I think that's the only way you're
going to get really good results with the type of queries you want to
On Mon, 2012-06-11 at 06:51 -0700, Alex wrote:
Thank you for responding! By "effectively not a big
difference" do you mean that it will not decrease query time
all that much? My plan is to one day have millions of these
documents, and I was wondering if query times would increase
with a growth in number of documents. I am really hoping that
the query times scale with the query length, and not the
number of documents. I've read articles online of people
achieving indexing rates of around 1000 documents a second. I
am only getting around 20-40 a second, is this to be expected
because of the long hash strings? I've tried setting the
refresh interval to -1 during indexing, but it did not make a
significant difference (perhaps it will when I have millions
of documents). I hope to hear from you again, thanks!