Anyone has a performance evaluation with NGram tokenizer?

Hello searchers,

I think, one of the best tokenizer algorithm (and maybe the most used), is the NGram tokenizer (equivalent to the SQL : where table.column like '%search_text%'.
Indeed, given an input text, it can generate a lot of (very lot) token. So given an input search text, it is likely to find a matched token. But for me, it is the easy solution, without taking into account the performance other side : how long does it takes to update this kind of index ? how much weight does it represents for searching in general, because this NGram algorithm can ask for a lot of disk space , too much as long as min_gram is low.

Any performance feedback ? Thx.

EDIT : is it possible to have any metrics which say "given this fulltext field of this index name, we have X generated token. Among all these tokens, you can see how often, for each one, it has been candidate and so matched a given input text". ?

It sounds like the the wildcard field type might be what you are looking for.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.