Multi-word Term Vectors with Word nGrams?

Hi all,

I'm aiming to build an index that, for each document, will break it down by
word ngrams (uni, bi, and tri), then capture term vector analysis on all of
those word ngrams. Is that possible with Elasticsearch?

For instance, for a document field containing "The car drives." I would be
able to get:

the - 1 instance
car - 1 instance
drives - 1 instance
the car - 1 instance
car drives - 1 instance
the car drives - 1 instance

Thanks in advance!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/23db3079-0475-4a63-bc77-e514bf087359%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Bump.

Also, need to correct my original example as 'The red car drives':

red
car
drives
red car
car drives
red car drives

On Friday, December 5, 2014 12:43:57 PM UTC-5, Adam Toy wrote:

Hi all,

I'm aiming to build an index that, for each document, will break it down
by word ngrams (uni, bi, and tri), then capture term vector analysis on all
of those word ngrams. Is that possible with Elasticsearch?

For instance, for a document field containing "The car drives." I would be
able to get:

the - 1 instance
car - 1 instance
drives - 1 instance
the car - 1 instance
car drives - 1 instance
the car drives - 1 instance

Thanks in advance!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b2841818-5996-4108-8468-c585c849c11b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Adam,

Have you seen the Shingle Filter?

Use as part of a custom analyzer, (you probably want to lowercase as well)

In Lucene-based search (ie Elasticsearch/Solr) "ngram" means character
ngrams. Like "red" => "r", "re", "red". What most folks think of as
"ngrams" Lucene calls "shingles".

Hope that helps
-Doug

On Tue, Dec 9, 2014 at 2:26 PM, Adam Toy art62@georgetown.edu wrote:

Bump.

Also, need to correct my original example as 'The red car drives':

red
car
drives
red car
car drives
red car drives

On Friday, December 5, 2014 12:43:57 PM UTC-5, Adam Toy wrote:

Hi all,

I'm aiming to build an index that, for each document, will break it down
by word ngrams (uni, bi, and tri), then capture term vector analysis on all
of those word ngrams. Is that possible with Elasticsearch?

For instance, for a document field containing "The car drives." I would
be able to get:

the - 1 instance
car - 1 instance
drives - 1 instance
the car - 1 instance
car drives - 1 instance
the car drives - 1 instance

Thanks in advance!

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/b2841818-5996-4108-8468-c585c849c11b%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/b2841818-5996-4108-8468-c585c849c11b%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
Doug Turnbull
Search & Big Data Architect
OpenSource Connections http://o19s.com

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALG6HL_dZVLiYNaRM%3DH844BFGbbATYweXCn-OE%2BL-ioo%3D3moOw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.