Shingle filter to support sub phrase matching


(nicktackes) #1

I have created a gist with an analyzer that uses filter shingle in attempt to match sub phrases.

For instance I have entries in the table with discrete phrases like
EGFR
Lung
Cancer
Lung Cancer

and I want to match these when searching the phrase 'EGFR related lung cancer

My expectation is that the multi word matches score higher than the single matches, for instance...

  1. Lung Cancer
  2. Lung
  3. Cancer
  4. EGFR

Additionally, I tried a standard analyzer match but this didnt yield the desired result either. One complicating aspect to this approach is that the min_shingle_size has to be 2 or more. How then would I be able to match single words like 'EGFR' or 'Lung'?

thanks

https://gist.github.com/nicktackes/ffdbf22aba393efc2169.js


(system) #2