Greetings
I have a field called hdr_subject
and I would like to create a new field such as hdr_subject_ngram
, which I would like to be an array of all word N-grams of the field hdr_subject
, up to certain size (e.g. 4).
For example, if
"hdr_subject" : "Discount: RX Viagra Pills"
then I would like
"hdr_subject_ngram" : [
"discount",
"rx",
"viagra",
"pills",
"discount rx",
"rx viagra",
"viagra pills",
"discount rx viagra",
"rx viagra pills",
"discount rx viagra pills"
]
This N-gram field will be indexed as a keyword, and used as an influencer for a Machine Learning job.
So far, I found https://stackoverflow.com/questions/27387231 to be of help, but I'm not sure how to create a new field out of the analyzer. Do you have any pointers?
Thanks