Slow attachment autocompletion with edge-ngrams and highlighting


I have a problem with slow performance when querying (term) an index with 10GB of attachments and using highlighting. The highlighting slows it down about 3-4x. I'm using edge-ngram tokenizer when indexing (min-gram-3, max-gram-15.
Is there an approach to speed things up??

(Nik Everett) #2

I expect the trouble with highlighting is that it wants to load the body of the document and to do that it has to unzip the 10GB attachment which is huge. Then it has to convert it to a Map so we can get the part we want out of the source. Disaster!

So you have two options I think. Have a look at the completion suggester - it is designed to return the term that it suggests without having to go to the document. OTOH it has to build the FST it uses to suggest to you in Java's heap so it has quite a bit of overhead. It still might be the best way.

Another option is to use the term suggester somehow though I'm not sure how - you'd have to experiment with it quite a lot. It doesn't reach into the source at all.


The problem is, the content of attachments is encoded base64, so I have no idea, how to use it with suggestions. What's more, the river (the mongodb river) prevents me from playing with index mapping very much.

(Nik Everett) #4

If you aren't using the content then why highlight it?

Also, rivers have been deprecated for a long time. I suspect you are using a super old version then. So upgrading might help some.


I'm using it to extract autocompletion phrases, using n-grams, but suggesters require special mapping (i.e. 'type':'suggestion').

(system) #6