New field as N-gram of another field

programagor · September 5, 2018, 11:08am

Greetings

I have a field called hdr_subject and I would like to create a new field such as hdr_subject_ngram, which I would like to be an array of all word N-grams of the field hdr_subject, up to certain size (e.g. 4).
For example, if

"hdr_subject" : "Discount: RX Viagra Pills"

then I would like

"hdr_subject_ngram" : [
  "discount",
  "rx",
  "viagra",
  "pills",
  "discount rx",
  "rx viagra",
  "viagra pills",
  "discount rx viagra",
  "rx viagra pills",
  "discount rx viagra pills"
]

This N-gram field will be indexed as a keyword, and used as an influencer for a Machine Learning job.

So far, I found https://stackoverflow.com/questions/27387231 to be of help, but I'm not sure how to create a new field out of the analyzer. Do you have any pointers?

Thanks

cbuescher · September 7, 2018, 9:17am

I think what you are looking for might be multi fields. You continue to index the existing "hdr_subject"field but set up an additional custom analyzer that uses the ngram tokenizer and use that one on a multi field of the original one.

system · October 5, 2018, 9:17am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Partial word search does not work with Ngram Analyzer! Elasticsearch	2	1415	October 11, 2017
Ngram not workign for multivalued field Elasticsearch	1	337	July 6, 2017
Is it possible to get (meaningful) term suggestions from an ngram analyzed field? Elasticsearch	1	305	July 1, 2019
Multi-word Term Vectors with Word nGrams? Elasticsearch	3	779	July 6, 2017
Phrase suggester and ngrams Elasticsearch	3	902	July 5, 2017

New field as N-gram of another field

Related topics