Searching after Using nGram for Indexing

(Ori Rubinfeld) #1


We are using the nGram analyzer with size of 3-7 letters.
This is due to Disk size storage limit which we have.

The users are trying to search for combination of more than 7 letters, but then, they find nothing.

How can I make it search through the index on the analyzed field even when we have search terms with more the 7 letters ?
Can I cause it to take first 7, and add an asterik ?



(Ali Beyad) #2

An asterisk after the query term won't help because it will have nothing to match against in the index, since you're only indexing up to 7 letters. I would recommend not adding the 7 letter limit. Have you tried that? Does it really expand your disk usage tremendously? My guess would be on most corpuses of text, the amount of terms greater than 7 or 8 characters is not huge, so it shouldn't increase your index size on disk by that much. On the other hand, you are really hampering your ability to search properly when every term stored in the index is cut down to a maximum of 7 letters. If you want to help your disk usage, you can maybe start with 4 letters for the nGram analyzer.

(system) #3