Slow attachment autocompletion with edge-ngrams and highlighting

kzal · September 21, 2016, 8:50pm

Hi,
I have a problem with slow performance when querying (term) an index with 10GB of attachments and using highlighting. The highlighting slows it down about 3-4x. I'm using edge-ngram tokenizer when indexing (min-gram-3, max-gram-15.
Is there an approach to speed things up??

nik9000 · September 22, 2016, 10:36am

I expect the trouble with highlighting is that it wants to load the body of the document and to do that it has to unzip the 10GB attachment which is huge. Then it has to convert it to a Map so we can get the part we want out of the source. Disaster!

So you have two options I think. Have a look at the completion suggester - it is designed to return the term that it suggests without having to go to the document. OTOH it has to build the FST it uses to suggest to you in Java's heap so it has quite a bit of overhead. It still might be the best way.

Another option is to use the term suggester somehow though I'm not sure how - you'd have to experiment with it quite a lot. It doesn't reach into the source at all.

kzal · September 22, 2016, 11:00am

The problem is, the content of attachments is encoded base64, so I have no idea, how to use it with suggestions. What's more, the river (the mongodb river) prevents me from playing with index mapping very much.

nik9000 · September 22, 2016, 8:42pm

If you aren't using the content then why highlight it?

Also, rivers have been deprecated for a long time. I suspect you are using a super old version then. So upgrading might help some.

kzal · September 23, 2016, 3:14pm

I'm using it to extract autocompletion phrases, using n-grams, but suggesters require special mapping (i.e. 'type':'suggestion').

Topic		Replies	Views
Why Completion Suggester with Edge Ngram analyzer takes 15 to 17 times more index size as compared to default Elasticsearch	2	1968	June 26, 2018
Autocomplete suggesters vs Edge NGram Elasticsearch	1	548	December 4, 2017
Completion suggester high memory usage Elasticsearch	1	521	January 27, 2022
How to improve AutoComplete performance? Elasticsearch	13	2358	July 6, 2017
Autocomplete on large fields Elasticsearch	1	329	July 6, 2017

Slow attachment autocompletion with edge-ngrams and highlighting

Related topics