I'm aiming to build an index that, for each document, will break it down by
word ngrams (uni, bi, and tri), then capture term vector analysis on all of
those word ngrams. Is that possible with Elasticsearch?
For instance, for a document field containing "The car drives." I would be
able to get:
the - 1 instance
car - 1 instance
drives - 1 instance
the car - 1 instance
car drives - 1 instance
the car drives - 1 instance
Also, need to correct my original example as 'The red car drives':
red
car
drives
red car
car drives
red car drives
On Friday, December 5, 2014 12:43:57 PM UTC-5, Adam Toy wrote:
Hi all,
I'm aiming to build an index that, for each document, will break it down
by word ngrams (uni, bi, and tri), then capture term vector analysis on all
of those word ngrams. Is that possible with Elasticsearch?
For instance, for a document field containing "The car drives." I would be
able to get:
the - 1 instance
car - 1 instance
drives - 1 instance
the car - 1 instance
car drives - 1 instance
the car drives - 1 instance
Use as part of a custom analyzer, (you probably want to lowercase as well)
In Lucene-based search (ie Elasticsearch/Solr) "ngram" means character
ngrams. Like "red" => "r", "re", "red". What most folks think of as
"ngrams" Lucene calls "shingles".
Also, need to correct my original example as 'The red car drives':
red
car
drives
red car
car drives
red car drives
On Friday, December 5, 2014 12:43:57 PM UTC-5, Adam Toy wrote:
Hi all,
I'm aiming to build an index that, for each document, will break it down
by word ngrams (uni, bi, and tri), then capture term vector analysis on all
of those word ngrams. Is that possible with Elasticsearch?
For instance, for a document field containing "The car drives." I would
be able to get:
the - 1 instance
car - 1 instance
drives - 1 instance
the car - 1 instance
car drives - 1 instance
the car drives - 1 instance
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.