New field as N-gram of another field


I have a field called hdr_subject and I would like to create a new field such as hdr_subject_ngram, which I would like to be an array of all word N-grams of the field hdr_subject, up to certain size (e.g. 4).
For example, if

"hdr_subject" : "Discount: RX Viagra Pills"

then I would like

"hdr_subject_ngram" : [
  "discount rx",
  "rx viagra",
  "viagra pills",
  "discount rx viagra",
  "rx viagra pills",
  "discount rx viagra pills"

This N-gram field will be indexed as a keyword, and used as an influencer for a Machine Learning job.

So far, I found to be of help, but I'm not sure how to create a new field out of the analyzer. Do you have any pointers?


I think what you are looking for might be multi fields. You continue to index the existing "hdr_subject"field but set up an additional custom analyzer that uses the ngram tokenizer and use that one on a multi field of the original one.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.